Neuromorphic Bayesian Optimization in Lava

The ever-increasing demands of computationally expensive and high-dimensional problems require novel optimization methods to find near-optimal solutions in a reasonable amount of time. Bayesian Optimization (BO) stands as one of the best methodologies for learning the underlying relationships within multi-variate problems. This allows users to optimize time consuming and computationally expensive black-box functions in feasible time frames. Existing BO implementations use traditional von-Neumann architectures, in which data and memory are separate. In this work, we introduce Lava Bayesian Optimization (LavaBO) as a contribution to the open-source Lava Software Framework. LavaBO is the first step towards developing a BO system compatible with heterogeneous, fine-grained parallel, in-memory neuromorphic computing architectures (e.g., Intel's Loihi platform). We evaluate the algorithmic performance of the LavaBO system on multiple problems such as training state-of-the-art spiking neural networks through back-propagation and evolutionary learning. Compared to traditional algorithms (such as grid and random search), we highlight the ability of LavaBO to explore the parameter search space with fewer expensive function evaluations, while discovering the optimal solutions.


INTRODUCTION
At the core of a vast array of problems lies a large function, evaluation of which is the most computationally expensive step: neural network design [19], transportation systems [26], graph neural networks [5], and evolutionary algorithms [17] are some examples.The common attribute in these problems is the computational complexity of the evaluation.Computer scientists and mathematicians dedicate tremendous amounts of time and energy developing and optimizing a plethora of algorithms for finding optimal or near-optimal solutions to such problems.The algorithms are either rooted in the study of the theory of computation [27] or the development of statistical models of relationships between multiple variables [28].The use of Bayesian Optimization (BO), based on Bayes' theorem, belongs to the latter class.Published in 1763, Bayes' theorem [4] revolutionized the field of conditional probability and probabilistic inference.It allows Bayesian systems to use their prior knowledge to construct probabilistic models of the world.There are many applications ranging from communication systems [14], medical diagnosis [11], and quantitative finance [21] to hyperparameter optimization [16][17][18].
In this work, we introduce Lava Bayesian Optimization (LavaBO), which adds support for BO in the open-source Lava Software Framework [8] and makes it available to the neuromorphic community. 1avaBO is the first step towards developing a BO system compatible with fine-grained parallel neuromorphic computing architectures (e.g., Intel's Loihi platform [6]).Specifically, using the CPU-based implementation of LavaBO, we evaluate its algorithmic performance on multiple problems: the classical non-convex case of the Ackley function [2], a classification problem using an evolutionary algorithm (EONS [24]) on the TENNLab framework [20], and a deep spiking neural network performing classification problem on the NMNIST dataset [15] using the SLAYER algorithm [10] from the Lava deep learning library [13].
We compare our results with grid and random search algorithms to highlight LavaBO's ability to uncover optimal solutions with fewer numbers of expensive evaluations of the corresponding blackbox functions.We conclude this article with a detailed discussion of the broader impact of this technology and our plan to implement it on Intel's Loihi 2 neuromorphic chip [12].

ARCHITECTURE OF LAVABO WITHIN THE LAVA FRAMEWORK
Lava Software Framework is an open-source software framework, aimed at lowering the barrier for programming neuromorphic hardware and prototyping neuro-inspired algorithms [8].It achieves this goal by providing the necessary tools and abstractions for creating applications that can run on different types of neuromorphic platforms, such as Intel's Loihi chips.Lava also supports conventional CPUs and GPUs for prototyping and testing purposes.Lava aims to be a modular, composable, and extensible framework that allows researchers to integrate their ideas into a growing algorithms library and build complex neuromorphic applications.One such library built on top of the Lava primitives is the Lava Optimization library.LavaBO is implemented with the same interface structure as the rest of the solvers within the Lava-Optimization library.This leads to a situation where the main interface between users and LavaBO is a wrapper class that initializes, connects, and executes all underlying Lava Processes 2 .Figure 1 shows a detailed look at the structure of LavaBO.The code and a tutorial on how to leverage this tool are accessible at https://github.com/lava-nc/lava-optimization.

Bayesian Solver
The BayesianSolver abstracts lower-level functionalities of the optimization process away and serves as the interface contract between users and developers.The system is highly configurable with multiple acquisition functions, acquisition optimizers, initial point generators, the number of initial points, and the random state.Once the desired parameters are determined and the BayesianSolver is initialized, the final remaining steps are creating a Lava Process and ProcessModel wrapper around the black-box function.The only requirements for these wrappers are that they must receive an array of parameters and return an array of parameters along with the resulting score.

Gaussian Regressor
The Gaussian-Regressor learns through prior knowledge and constructs a statistical model of unknown points and their resultant values.This system is implemented as a floating-point Gaussian regressor based on equation 1 where () is a mean function over a real process  () and  (,  ′ ) is a covariance function over the same  () [22].

Acquisition Function
We will explore and exploit the search space through the acquisition function.The AcquisitionFunction defines and calculates a function on the posterior distribution from the GaussianRegressor that represents uncertainly across the space.LavaBO supports three different acquisition functions: lower confidence bound [23], negative expected improvement [3], and negative probability of improvement [3].

Acquisition Optimizer
The AcquisitionOptimizer determines how points are selected based on the acquisition distribution from the AcquisitionFunction.For example, one could calculate the acquisition function across all points in the space or intelligently or randomly select a subset to calculate.We currently support two methods: random sampling [1] and inverse Hessian matrix estimation [29]. 2 See http://lava-nc.orgfor details about Lava concepts like a Process.

Closing the loop
The process begins at the initial point sampler (IPS), where points are broadly sampled to construct the initial surrogate model of the space within the GaussianRegressor.The Acquisition-Distribution then calculates an acquisition distribution from the GaussianRegressor's posterior distribution that models uncertainty throughout the search space.The AcquisitionOptimizer uses the acquisition distribution to determine which point to select for evaluation.The selected point is transferred from the Bayesian-Optimizer to the user's black-box function where the given hyperparameter configuration will be evaluated.After evaluation, the black-box function transfers the result back to the BayesianOptimizer.The BayesianOptimizer has new information about the underlying search space from the black-box process.Therefore, the next step is to transfer this information to the GaussianRegressor where it will incorporate this result into its statistical model of the space.The newfound posterior distribution from the GaussianRegressor is then sent to the AcquisitionFunction.Here, the posterior distribution is used to update acquisition distribution.Lastly, the acquisition distribution is sent to the AcquisitionOptimizer where the next point is determined.

RESULTS
To highlight the capability of LavaBO, we chose three experiments for optimizing: (1) a standard two dimensional function, the Ackley function [2].
(2) hyperparameters of an evolutionary algorithm for training a spiking neural network to classify the IRIS dataset [7].(3) hyperparameters of a backpropagation-based learning algorithm, SLAYER from the Lava deep learning library [13].We use random search or grid search algorithms as a baseline performance measurement for all experiments.Table 1 summarizes the results from all experiments.In the subsections following the table, we discuss the details of each experiment.A baseline random search was also conducted 10 times over the search space with each run consisting of randomly sampling and evaluating 50 points from the space.The probability distributions of the observed points for both algorithms are shown in Figure 3.

The Ackley function
There are stark performance differences between random search and LavaBO.As expected, the probability density function from the random search closely mirrors a uniform distribution.LavaBO's observations are tightly clustered around the minimum of the Ackley function, with a small percentage of the observations used in the fringes of the search space.

Evolutionary Learning for IRIS Classification
In this experiment we compare performance of LavaBO and an exhaustive grid search on optimizing hyperparameters of an evolutionarybased learning algorithm (EONS) [25] to solve a classification task on IRIS dataset [7].The details of the hyperparameters and search  space can be found in Table 2.We completed this task within the TENNLab framework [7].The grid search was performed over all 432 parameter combinations; however, the LavaBO was only ran for 20 iterations.We repeated the LavaBO process 10 times and averaged the final results to gain a broader understanding of the stability of the system.This is shown in Figure 4. LavaBO was able to learn the hyperparameter space and achieve 96.3% accuracy after 16 Bayesian iterations.

SLAYER on Lava-DL for NMNIST Classification
In the third experiment, we demonstrate LavaBO's ability to find optimum hyperparameters of a deep spiking neural network with limited training epochs.We used the Lava deep learning library [15] on the NMNIST dataset which aims to classify event-based handwritten digits.While the native training configuration requires 200 epochs, we limited the number of training epochs during the optimization process to 10 as many applications require models to converge very quickly.Therefore the goal of this experiment is to use LavaBO to find a hyperparameter configuration that achieves the highest validation accuracy during this limited training period.
The search space consists of parameters for voltage threshold of the spiking neurons ('threshold'), decay constants for current and voltage ('current decay' and 'voltage decay', respectively), the  time constant of the spike function derivative ('tau grad'), and the rate of change of the model parameters (the 'learning rate').More information on the specifics of each parameter and their ranges are shown in Table 3.
We ran random search and LavaBO on Lava-DL-SLAYER for NMNIST digit classification for 30 iterations and repeated each hyperparameter combination for three times.We only trained each hyperparameter combination for 10 epochs.Lava BO, on average, converges to a specific subset of the parameter search space that has over twice the accuracy of random search.This suggests that a subset of the space leads to more rapid convergence on the NMNIST digit classification task.Figure 5 highlights this fact where Lava BO is constantly exploring new parameter combinations that have a much higher probability of performing well versus random search where the only high performing combinations are outliers.
Evaluating the implications of our reduced number of training epochs, we chose 3 average configurations representing low, medium, and high performers with medians of 11%, 28%, and 58% after 10 epochs, respectively.These categories and statistics were selected based on the distribution of results from Figure 5.For these parameter combinations we continued the training for 200 epochs.For each combination, this figure also shows the original accuracy values after 10 epochs.The low performing model never improved its accuracy.The medium configuration represents a midpoint between the low and high categories, where the the final accuracy is between the upper and lower categories.Lastly, the high performance model achieved around 90% accuracy after 200 epochs.We conclude that the highest performing models during the initial 10 epochs training have a higher likelihood of performing well at the end of full training sessions (200 epochs).

DISCUSSION
In this work, we introduced the open-access Lava Bayesian Optimization (LavaBO) solver within the Lava Software Framework for neuromorphic computing.We outlined the architecture of the solver and compared its algorithmic performance with traditional algorithms like random search, with the help of three problems: optimization of the Ackley function, hyperparameter tuning for evolutionary algorithms, and hyperparameter tuning for training of deep spiking neural networks.The results consistently show that LavaBO finds optimal hyperparameter combinations in significantly less iterations than random search.
As the natural next step, we are going to extend the LavaBO solver to be compatible with constraints imposed by the Loihi 2 neuromorphic chip.We hope that the extension will enable us to accelerate the execution of the solver using Loihi 2, while significantly reducing the power consumption.Such extension requires us to focus on two aspects: (a) Loihi 2 supports only fixed-point integer arithmetic with limited precision.As a result, we are investigating the effects of rounding on the accuracy of our solver, (b) additionally, for neural implementation of various components of LavaBO (e.g., the Gaussian Process regression), we are exploring the area of hyperdimensional (HD) computing using vector symbolic architectures [9].We hope this enables us to map all components of LavaBO efficiently on a neuromorphic substrate like Loihi 2.

Figure 1 :
Figure 1: A flowchart presenting the programmatic architecture of the Lava Bayesian Optimization system.

Finding the global minimum
of the Ackley function is a classical benchmark problem for non-convex global optimization.The equation for Ackley function is given as 2.  () = − exp − 1  ) + ( + )

Figure 3 :
Figure 3: The probability distributions of observed points from Lava Bayesian Optimization and random search on the Ackley function.The red dots at   = 0 represent the optimum point within the search space.

2 :
The hyperparameter search space for the grid search and LavaBO on the evolutionary learning for IRIS classification problem.Search space size is 432.

Figure 4 :
Figure 4: A comparison of LavaBO and grid search performance in optimization accuracy of IRIS classification using evolved spiking neural networks.

3 :
The hyperparameter search space for the optimization problem with the Lava Deep Learning library, SLAYER, and NMNIST.All parameters have a series of discrete options between the min and max values.The individuals values are separated by the delta.This search space contains 137M parameter combinations.

Figure 5 :
Figure 5: Side by side box plots of observation evaluations at every optimization iteration with Lava BO and random search.The ordering the box plots along the x-axis was sorted by their means.We see that the random search algorithm rarely ever evaluated hyperparameter configurations with greater than 25% accuracy while Lava BO is consistently exploring points over 40% accuracy while only training for 10 epochs.

Figure 6
highlights the accuracy values for training for 200 epochs.

Figure 6 :
Figure 6: Accuracy plots for classifying NMNIST dataset for different hyperparameter combinations using SLAYER for 200 epochs.The three hyperparameter configurations are chosen based on low, medium, and high performing after training for 10 epochs.

Table 1 :
Summary of results from all three experiments detailed in this work.†Inthe case of training of deep SNNs, number of network evaluations to reach the same accuracy was counted.This iso-accuracy level is ∼30% after 10 training epochs, as evident from Figure5.