hxtorch.snn: Machine-learning-inspired Spiking Neural Network Modeling on BrainScaleS-2

Neuromorphic systems require user-friendly software to support the design and optimization of experiments. In this work, we address this need by presenting our development of a machine learning-based modeling framework for the BrainScaleS-2 neuromorphic system. This work represents an improvement over previous efforts, which either focused on the matrix-multiplication mode of BrainScaleS-2 or lacked full automation. Our framework, called hxtorch.snn, enables the hardware-in-the-loop training of spiking neural networks within PyTorch, including support for auto differentiation in a fully-automated hardware experiment workflow. In addition, hxtorch.snn facilitates seamless transitions between emulating on hardware and simulating in software. We demonstrate the capabilities of hxtorch.snn on a classification task using the Yin-Yang dataset employing a gradient-based approach with surrogate gradients and densely sampled membrane observations from the BrainScaleS-2 hardware system.


INTRODUCTION
Modern high-performance computing (HPC) environments speed up individual workloads with domain-specific hardware accelerators [9].However, successfully establishing novel computing paradigms depends as much on the availability of a computing substrate as it does on the software infrastructure that provides the means to work with that substrate.In particular, the 'programming model' of neuromorphic hardware is a very different approach to data processing than conventional systems [22,23].Recently, machine-learninginspired training methods, and in particular gradient-based optimization of spiking neural networks (SNNs), have become increasingly popular [4,13,14,24,32].
We introduce hxtorch.snn, a PyTorch [25] wrapper library for the accelerated mixed-signal neuromorphic hardware system BrainScaleS-2 (BSS-2) [26].We discuss and demonstrate our implementation using a simple benchmark task trained with a gradientbased method and the BSS-2 hardware system in the loop.The implementation described here has also been used in a real-world application [2] and will serve as the basis for future experiments.
Here we use the gradient estimation method introduced in [8], which uses the unrolled computational graph determined by the forward pass, to perform gradient estimation by injecting hardware voltage measurements and spikes in the backward pass.The overall library, however, is agnostic to this particular choice of gradient estimation method.Other approaches can be implemented, such as using a closed-form analytical formula [16] or the EventProp algorithm [38].To support this generic framework for gradient estimation, we implement utilities for interpolation and normalization of hardware measurements, conversion of spike observations into dense tensors, and procedures for weight quantization.
The BSS-2 mixed-signal neuromorphic hardware emulates networks of spiking neurons time-continuously in analog circuits [26].Each of the 512 adaptive exponential integrate-and-fire (AdEx) [7] neuron compartments on BSS-2 are individually parameterized and, in particular, can be configured to resemble leaky-integrate and fire (LIF) and leaky integrator (LI) neuron dynamics.They receive input stimuli from a column of 256 exponential synapses in the synapse matrix with adjustable 6-bit weights.Configuration of the on-chip routing mechanism and the synapse matrix allows addressing the neurons' digital spike events to target synapses and, thus, the realization of arbitrary topologies.In addition, artificial neural networks (ANNs) can be emulated by using the neurons in a nonspiking mode, which allows implementing analog vector-matrix multiplications [34,37].The recordings of spike events and membrane potentials, sampled in parallel via a columnar ADC (CADC), are accessible by the host computer and can be incorporated, e.g., for weight update computation in a hardware-in-the-loop (ITL) [31] fashion.A field-programmable gate array (FPGA) serves as the real-time experiment master and as interconnect between the neuromorphic chip and a host computer.
We use a layered software approach to abstract hardware usage [22].At the lowest level, a transport layer and an instruction set of commands to the FPGA, which execute sequentially, are employed.Above, individual hardware entity configuration, e.g., for each neuron, is abstracted into containers residing in a uniform address space for type safety and to remove requiring knowledge about the memory layout.Using this hardware abstraction, a data-flow-graph-based experiment notation, grenade, separates network topology specification, temporal evolution description, and execution to support the accelerated real-time nature of the neuromorphic hardware.On top, front ends for spiking and non-spiking experiments exist.Spiking neural networks can be described in the back-end-agnostic language PyNN [10] targeting computational neuroscience.For offloading ANNs, i.e. analog vector-matrix multiplications, a thin PyTorch machine-learning adapter, hxtorch, was developed [33], separating hardware interaction and machinelearning framework.Similarly, this work adds support for describing SNNs in this machine-learning framework yielding access to training, inference, and data alteration integrated into PyTorch's ecosystem.

METHODS
For developing a machine-learning framework supporting the emulation of SNNs on BSS-2, we follow the same structure as for the developed ANN machine-learning adaptor hxtorch [33].Due to its vast community, we continue to build our front end upon the PyTorch framework.Given the accelerated time-continuous nature of the neuromorphic hardware, network topology specification, temporal evolution and execution need to be separated in the machine-learning front end.In particular, individual numerical operations in a time-grid-based simulation cannot be trivially identified with dynamics on the neuromorphic substrate.
We use the data-flow-graph-based experiment notation [22] as the back end.The machine-learning front end then needs to handle data conversion to and from PyTorch tensors and wrapping of network topology descriptions in a PyTorch-compatible notation.Therefore, hardware interaction and machine-learning framework adaptation are separated.
In PyTorch, models are eagerly executed by the computation of a layer's result when calling its forward method, which implies that the network builds up incrementally.However, the emulation of SNNs in physical time requires knowledge of the complete network topology before experiment execution, to derive a corresponding hardware configuration.Therefore, an additional separation is required.
Since training neural networks benefits from the auto differentiation provided by the machine-learning framework, we aim to incorporate this into the network description.Within PyTorch, we train with the hardware in the loop by emulating the forward pass on BSS-2 and injecting its results in the backward pass on the host computer.Moreover, the convenience of describing models by a composition of modules, i.e. network layers, should be maintained and allows for defining network topologies with recurrently connected layers.
For developing a model for the neuromorphic hardware from scratch or a reference model, iterative development is beneficial.Exchanging a simulated model with emulation on BSS-2 requires parameter translations, e.g., weight scaling and adaptation to limited resolution on hardware or neuron model parameter translation to technical hardware parameters.A seamless comparison to and replacement of (parts of) the network by (custom) simulation facilitates this.It also enables hybrid networks, for example, by adding the possibility to introduce layers not representable on the neuromorphic hardware, e.g., neuron models not present on hardware or other arbitrary numerical operations on intermediate results, while retaining the neuromorphic hardware's acceleration for the representable parts.
Lastly, the collection of hardware observables, such as membrane potential recordings or spike trains, shall be adjustable per layer depending on the training algorithms.Importantly, the transformation of acquired hardware observables into PyTorch data structures needs to be fully customizable to be best-tailored for the training algorithm at hand, e.g., the transition of sparse time-series data to a dense time grid, the interpolation of sparse membrane measurements, or the mapping of synapse weight parameters between PyTorch and BSS-2 can impact the network's dynamics as well as the learning process.

RESULTS
The software library hxtorch.snnintegrates modeling SNNs on BSS-2 into the PyTorch ecosystem.It extends the thin wrapper library hxtorch, which previously only targeted modeling ANNs.
Figure 1 shows the library structure, experiment description, and constitution.Similarly to ANN models in PyTorch, we describe constituents of a SNN model like synapse and neuron layers via custom modules derived from HXModule, containing parameters and representing the corresponding hardware entity.We want to retain PyTorch's eager model construction and execution interface by successively applying modules' forward(input) calls, but we need to separate the construction of the complete model from its execution on the hardware.Therefore, we only register each module's invocation into an Instance, passed to each module on its construction.Instead of returning the actual results, we return Handle-typed promises of future results.The instance is then separately executable, and the promises are filled with the result data of the respective modules after the instance's execution.An instance  [19], where A depicts the data flow in the model and with BSS-2 and B displays the corresponding source code.The network consists of two synapse layers and two LI/LIF neuron layers and is associated to an instance to be executed on hardware.Execution is triggered explicitly via the run function and the handles x, y and loss only afterwards carry their result data from the execution.This then in particular allows gradient calculation via backward functions attached to the handles' data.
represents one hardware experiment execution.Upon explicit execution of an instance via run, the network topology is extracted from the registered modules and their invocations and converted into a graph-based description, which is then mapped to a hardware experiment description and executed on hardware.The resulting data is transformed into PyTorch tensors, post-processed via a customizable method, and finally annotated onto the corresponding data handles, thereby being accessible by the user.Since instances of Instance only require hardware allocations upon execution, multiple instances can coexist and be executed sequentially.This allows interleaving hardware-executed model parts and simulated parts as long as no inter-instance recurrence is required.
The model construction via network entity module invocations allows defining recurrent models by associating each module instance with a network entity and using invocations to connect them, cf.listing 1.
Listing 1: Construction of a recurrent network.When reusing the same network entity nrn, a recurrent connection is created.nrn = hxtorch .snn .LIF (...) syn1 = hxtorch .snn .Synapse (...) syn2 = hxtorch .snn .Synapse (...) x = syn1 ( input ) x = nrn ( x ) # feed -forward x = syn2 ( x ) x = nrn ( x ) # recurrence Upon construction, each module is equipped with a custom PyTorch-differentiable function, either directly as Autograd.Function or via a function defining both a simulated forward pass and an implicit backward pass.In the first case, hxtorch.snninjects the hardware data implicitly to be used in the backward function, and in the second case, hardware observables are provided as an additional function argument.This allows annotating a backward pass to the handle data tensors such that PyTorch's auto differentiation can seamlessly backpropagate gradients based on hardware observations.When a simulated forward pass is provided, this module can also be used without hardware usage as long as it does not participate in a cyclic sub-network.This greatly aids network development and translation onto the hardware, since (parts of) a simulated reference network can gradually be interchanged with and compared to hardware entities within the same library.
Currently, hxtorch.snnsupports LIF and LI neuron layers and has access to spike times and membrane voltage recordings of individual hardware neurons.To support the full extent of BSS-2 neuron dynamics, a forward and backward PyTorch implementation of the AdEx neuron model is planned.
A dropout module provides additional machine learning functionality by applying a batch-wise spiking mask to a preceding neuron layer and disabling the spike output on hardware accordingly.
Synaptic connections on BSS-2 are exposed to PyTorch as a Synapse module, representing a projection between neuron layers.This provides a framework for recording of additional synapserelated observables, such as spike-time correlation sensor recordings.As a demonstration of our framework we consider the lowdimensional Yin-Yang classification task [19] visualized in Fig- ure 2A.It consists of three classes yin, yang, and dots, each defining an area on the 2-dimensional -plane.Its -th sample is a point   = (  ,   ), randomly drawn from the corresponding areas and labeled accordingly.As input to the SNN, the sample   is translated to spike times of five input neurons [16,19].As shown in Figure 2B the point's values,   and   , in each direction, as well as their inverse, 1 −   and 1 −  , are scaled linearly to spike times in the time window  early ,  late .In addition, a bias spike shortly after  early is inserted to increase the network activity which is observed for the duration  .A hidden layer constitutes 120 LIF neurons in our SNN on BSS-2 integrates the input events and projects spikes itself onto a readout layer of 3 LI neurons, each corresponding to one class, as indicated by Figure 2A.This allows it to infer a class decision   for sample   by interpreting the maximum membrane value of the output neurons   over time as a score for the corresponding class, i.e.  = argmax  (max    ()).For each sample, the SNN is emulated for  = 60 µs.We use the cross-entropy loss on the max-over-time values   = max    as the objective function, and SuperSpike [24] surrogate gradients for training.In hxtorch.snn, the PyTorch model is defined as outlined in Figure 1B.The upper plot in Figure 2D exemplifies the output traces while inferring a single sample.The membrane voltage of the output neuron corresponding to the sample's class has the maximum value over time.The lower plot depicts the achieved accuracy of the SNN on BSS-2 over epochs; it reaches about 94.63 ± 0.7% (standard deviation over 15 seeds).

DISCUSSION
We presented the library hxtorch.snn to describe SNN models and their emulation on BSS-2 in the PyTorch ecosystem.It integrates the auto-differentiation capabilities of PyTorch, providing access to the same training methodology as for ANNs.Separation of network construction and emulation allows taking full advantage of the emulation speed-up of BSS-2 and the description of feed-forward and recurrent network topologies.The API is designed for user-defined extension of neuron and synapse types, including customization of the backward pass while removing the need for expert knowledge of the hardware.Moreover, we showcased the library by using it to classify the YinYang dataset [19].
The Yin-Yang samples are encoded in spike events as proposed in [19] and fed into an SNN with one hidden LIF layer followed by a LI readout layer.When emulated on BSS-2, the SNN achieves a classification accuracy of 94.63 ± 0.7% when using a max-overtime loss.Compared to an accuracy of 95.0 ± 0.9% achieved on BSS-2 in [16], where the authors use spiking output neurons and an analytical solution for the gradient to optimize the model, our accuracy is only marginally lower.This verifies the design choices of our machine learning layer hxtorch.snnand demonstrates the implementation successfully.
The source code for the demonstration experiment, hxtorch.snn,and the underlying open-source software stack is available online [11,36].As part of the EBRAINS research infrastructure [1], we operate BSS-2 as a service allowing researchers to interactively conduct experiments and research on the neuromorphic platform.In [35] we provide and maintain a collection of experiments as interactive learning material for the platform.
Table 1 evaluates the experiment runtime overhead of the software.The surrogate-gradient-based training is dominated by data transformations, mostly from membrane measurement data.However, more data-sparse training algorithms, such as EventProp [38], can avoid much of this conversion overhead.To further optimize performance, adding support for event-based numerical codes would also be beneficial, as it would allow us to avoid converting to a time grid.Here, for example, a port of hxtorch.snn to jax [6] could offer advantages based on increased flexibility yielding opportunities for more efficient data structures and codes.
The backward functions currently assigned to each module represent the entire backward dynamics of the module over time.Since this is a constraint for recurrent topologies, we aim to incorporate algorithms and interfaces that allow to either assign a single backward function to a defined sub-network (and forward in mock mode) or, for a defined time grid, assign each module a function representing one integration step and then build the backward graph by implicit time-unrolling in an outer loop (and forward in mock mode).Additionally, we plan to provide support for online learning rules like the e-prop learning method [5] involving code generation for the embedded single instruction, multiple data (SIMD) microprocessors on BSS-2.
While currently one Instance instance corresponds to a single hardware experiment, we want to support mixed networks that are only partially executed on hardware by implicitly identifying the sub-networks to run on BSS-2 and execute the remaining parts in software.In addition, we will implement the ability to execute layers partially in order to allow the use of layer sizes that cannot be mapped to BSS-2 due to limited neuron resources.This will enable the pipelined and parallel execution of large layers on multiple chips.Partial functionality of our hxtorch.snnwill be consolidated (D) The upper plot depicts the analog output membrane voltages while inferring the red sample in A. In the lower plot the classification accuracy is shown over the test epochs.On BSS-2, the SNN achieves an accuracy of 94.63 ± 0.7%.The code is available as an interactive demo [36].
in Norse [27], and will serve as the mock backend, provide spike encoding and decoding schemes, and helper functions.Overall, hxtorch.snn is a convenient and flexible tool for implementing and training SNN models on BSS-2, utilizing the autodifferentiation capabilities of PyTorch to enable machine-learninginspired training methods.With further optimization and the support for the upcoming multi-chip systems, it has the potential to greatly facilitate and accelerate machine-learning-inspired SNN research and experimentation.

Figure 1 :
Figure 1: Software application programming interface (API) of hxtorch.snnshown exemplarily for the feed-forward network used in the classification of the Yin-Yang dataset[19], where A depicts the data flow in the model and with BSS-2 and B displays the corresponding source code.The network consists of two synapse layers and two LI/LIF neuron layers and is associated to an instance to be executed on hardware.Execution is triggered explicitly via the run function and the handles x, y and loss only afterwards carry their result data from the execution.This then in particular allows gradient calculation via backward functions attached to the handles' data.

Figure 2 :
Figure 2: (A) Example samples from the Yin-Yang dataset[19].The dataset consists of samples defined on the -plane, each assigned to one of three nonlinear separable classes yin (orange), yang (blue), and dots (green).(B) The spike encoding of the 2D point from the Yin-Yang dataset in A depicted as a red star.Each dimension of the point and an inverse of it is translated to a spike event.Additionally, a bias spike is added to increase network activity.(C) The SNN topology used to classify the Yin-Yang dataset.An input layer projects the spike-encoded point onto a hidden layer consisting of LIF neurons.A LI readout layer receives the spike events of the hidden layer.Their maximal membrane values over time are used to infer a decision.(D) The upper plot depicts the analog output membrane voltages while inferring the red sample in A. In the lower plot the classification accuracy is shown over the test epochs.On BSS-2, the SNN achieves an accuracy of 94.63 ± 0.7%.The code is available as an interactive demo[36].

Table 1 :
Runtime performance of a training epoch of the Yin-Yang experiment consisting of 64 batches to 75 samples using a host computer with an AMD Ryzen 7 3800X central processing unit (CPU).While the emulated network duration falls two orders of magnitude behind the total training runtime, the majority of the additional hardware and backend time consists of the analog-to-digital converter (ADC) neuron membrane potential readout.The data transformation to PyTorch tensors is dominated by interpolation of the sparse-in-time ADC data onto a dense time grid.The additional front-end overhead and gradient calculation don't contribute significantly to the total training duration.