Hardware-Aware Static Optimization of Hyperdimensional Computations

Binary spatter code (BSC)-based hyperdimensional computing (HDC) is a highly error-resilient approximate computational paradigm suited for error-prone, emerging hardware platforms. In BSC HDC, the basic datatype is a hypervector, a typically large binary vector, where the size of the hypervector has a significant impact on the fidelity and resource usage of the computation. Typically, the hypervector size is dynamically tuned to deliver the desired accuracy; this process is time-consuming and often produces hypervector sizes that lack accuracy guarantees and produce poor results when reused for very similar workloads. We present Heim, a hardware-aware static analysis and optimization framework for BSC HD computations. Heim analytically derives the minimum hypervector size that minimizes resource usage and meets the target accuracy requirement. Heim guarantees the optimized computation converges to the user-provided accuracy target on expectation, even in the presence of hardware error. Heim deploys a novel static analysis procedure that unifies theoretical results from the neuroscience community to systematically optimize HD computations. We evaluate Heim against dynamic tuning-based optimization on 25 benchmark data structures. Given a 99% accuracy requirement, Heim-optimized computations achieve a 99.2%-100.0% median accuracy, up to 49.5% higher than dynamic tuning-based optimization, while achieving 1.15x-7.14x reductions in hypervector size compared to HD computations that achieve comparable query accuracy and finding parametrizations 30.0x-100167.4x faster than dynamic tuning-based approaches. We also use Heim to systematically evaluate the performance benefits of using analog CAMs and multiple-bit-per-cell ReRAM over conventional hardware, while maintaining iso-accuracy -- for both emerging technologies, we find usages where the emerging hardware imparts significant benefits.


INTRODUCTION
Over the years, researchers have developed many emerging memory technologies (e.g., FeRAM, ReRAM, STT-MRAM) that offer non-volatility, better write endurance, and faster write speeds, and support integration into monolithic 3D integrated circuits due to low annealing temperatures.[Halawani et al. 2021;Imani et al. 2017bImani et al. , 2019c;;Karunaratne et al. 2020;Poduval et al. 2021;Rahimi et al. 2017;Wu et al. 2018] Moreover, resistive memories, such as ReRAM, have also been used to build analog in-memory computing fabrics that eliminate data movement by performing computation directly within memory cells, and as memory units in monolithic systems that employ other emerging device technologies (CNFETs) to realize extremely communication dense, next-generation hardware platforms.These new technologies offer unprecedented benefits but have not seen broad adoption, as they have much higher bit corruption rates than conventional hardware.These hardware errors often arise due to intrinsic properties in the involved materials, and therefore remain a significant problem despite investments from the devices community.[Imani et al. 2017b;Shulaker et al. 2014] Challenges with Approximate Classical Computation.Practitioners from the hardware and software communities have developed a range of techniques for statically and dynamically optimizing classical computations to execute reliably on error-prone hardware.[Achour and Rinard 2015;Misailovic et al. 2014;Sharif et al. 2021] All of these methods grapple with two truths of classical approximate computing: (1) certain bits are essential to the computation and must be retained accurately (e.g., exponent vs. mantissa bits), (2) some compute operations (e.g., branching) need to execute accurately to obtain a usable result.As a result, these techniques typically require computations and data to be partitioned into precise/approximation-amenable regions, where precise data and compute are either run separately on reliable hardware or run with a number of protection mechanisms (e.g., redundancy, ECC) that guard against bit corruptions.These requirements complicate the architectural designs of these platforms and introduce overheads into the computation.These inefficiencies and the relative difficulty associated with statically propagating error through programs without over-approximation make it exceedingly difficult to soundly and efficiently perform computation on emerging hardware.
1.1 Hyperdimensional Computing / Vector Symbolic Architectures Hyperdimensional Computing (HDC), or alternatively Vector Symbolic Architectures (VSA), is an approximate computing paradigm well-matched to error-prone, emerging computing platforms.The basic unit of data is a hypervector -a large numeric or binary vector -that distributes program information evenly across bits/values.There are many variants or dialects of HDC; this work focuses on the binary spatter code (BSC) variant of HDC that works with dense binary hypervectors.[Kanerva et al. 1997] BSC HDC offers three key computational characteristics which together enable sound and robust approximate computing on emerging technologies: ▶ Distributed Data Representation.All hypervector bits are equally important, and all bit errors have the same effect on the hypervector result.Therefore, a single-bit flip has the same effect on the computed result, regardless of where it occurs in memory, or within the computational pipeline.▶ Distance-Based Computation.Information is encoded over the relative similarity/dissimilarity of hypervectors, where the similarity of two hypervectors is computed with the Hamming distance metric.The Hamming distance is highly resilient to bit errors, as many bit corruptions are required to substantially influence the calculated distance.▶ Simple Operators.The basic HD operators are implemented with circular shift and bit-wise XOR and majority operations.These operators are both hardware-efficient and amenable to analysis.
In contrast, in classical computation, a single bit error can have an outsized effect on a computational result, the sensitivity of the final result to error is workload-dependent, and it is typically difficult to statically analyze the propagation of bit errors through the program without over-approximation.
Applications.HD computing has enjoyed increased attention in the hardware and software research communities.[Imani et al. 2019a[Imani et al. , 2018[Imani et al. , 2019b;;Kim et al. 2020] Practitioners have devised HD computations to build a range of data structures, including database records, graphs, trees, and finite-state automata, [Osipov et al. 2017;Yerxa et al. 2018] and to perform a number of processing tasks, including signal and language classification, information retrieval, workload balancing, and analogical reasoning.[Eggimann et al. 2021;Gayler and Levy 2009;Heddes et al. 2022;Jones and Mewhort 2007;Kanerva 2010;Karunaratne et al. 2020;Kleyko et al. 2022Kleyko et al. , 2020;;Pashchenko et al. 2020;Plate 2000;Rachkovskij and Slipchenko 2012;Simpkin et al. 2019] HD computation has also been successfully used in recent years to improve the accuracy and efficiency of edge ML models, and to embed intuition about problem structure into ML training tasks.[Imani et al. 2017a;Rahimi et al. 2016Rahimi et al. , 2018;;Schlegel et al. 2021Schlegel et al. , 2022;;Theiss et al. 2022] Challenges with VSA/HDC.One drawback to HD computing is that large hypervectors are usually required to encode information reliably.The hypervector size strongly affects the accuracy of the implemented HD computation and determines the amount of information that can be reliably encoded with the hypervector.Lower dimensional bit-vectors consume less space but potentially reduce one's ability to retrieve information reliably.Typically, practitioners either leave the hypervector size unoptimized or dynamically tune the hypervector size by running Monte Carlo simulations for each parametrization of the target computation.[Kanerva 2009[Kanerva , 2014[Kanerva , 2018;;Montagna et al. 2018;Rahimi et al. 2017] Dynamic tuning is time-consuming and has a tendency to overfit -the chosen hypervector sizes do not generalize well when minor adjustments are made to the computation.

The Heim Optimizer
We present Heim, the first (to our knowledge) static analysis and optimization tool for BSC HD computations.To summarize, the HDC paradigm enables robust computation on error-prone hardware, and Heim delivers accuracy guarantees even in the presence of hardware error.Given a hardware error model and an accuracy specification, Heim derives the smallest hypervector size that meets the specified accuracy requirements on the target hardware platform: ▶ Analysis.Heim deploys a precise and sound static analysis that guarantees the convergence of the accuracy of the HD computation to the desired accuracy on expectation.The analysis uses several novel theoretical results to soundly derive the expected accuracy for HD computations (see Table 1).▶ Hardware-Aware.Heim optimizes HD computations to execute accurately on hardware platforms that use error-prone and emerging device technologies.Heim's analysis procedure works with a hardware error model and delivers accuracy guarantees in the presence of hardware error.▶ Robust Optimization.The Heim-derived parametrization is guaranteed to deliver the desired accuracy for all HD computations captured by the accuracy specification.Heim analytically derives important HDC program parameters, including distance thresholds and HD operation-specific hypervector sizes, which together are used to optimize the computation.

Contributions
▶ Heim Accuracy Analysis.We present a novel accuracy analysis that employs novel theoretical results to derive the expected accuracy for a set of BSC computations on an emerging hardware technology.The accuracy analysis works with an accuracy specification that supports the description of HD computations and their associated accuracy constraints.▶ Heim Optimizer.We present an algorithm that uses the above accuracy analysis to statically derive thresholds and hypervector sizes that minimize resource usage for a given HD computation while satisfying the accuracy constraints provided in the Heim accuracy specification.▶ Evaluation.We evaluate Heim against dynamic tuning-based optimization on 25 data structures.
Given a 99% accuracy requirement, Heim-optimized computations achieve a 99.2%-100.0%median accuracy, up to 49.5% higher than dynamic tuning, and achieve 1.15x-7.14xreductions in hypervector size compared to iso-accuracy dynamically tuned executions.Heim also finds parametrizations 30.0x-100167.4x faster than dynamic tuning-based approaches.We also use Heim to find optimized computations at iso-accuracy for two emerging hardware technologies and find usages where the emerging hardware imparts significant benefits over the classical HDC implementation.

HYPERDIMENSIONAL COMPUTING
Hyperdimensional computing (HDC) is a highly error-resilient brain-inspired computational paradigm.In HDC, information is encoded by computing over randomly generated vectors corresponding to symbols (e.g., letters, colors, objects) in the application domain using binding, bundling, and permutation operations.A hypervector is a numerical vector which may contain binary, modular integer, complex, or real values depending on the HDC.Information is retrieved from HD-encoded data by computing the distance ( (ℎ,ℎ ′ )) between hypervectors, where hypervectors with small distances are similar.The distance threshold (ℎ ) determines the cutoff point that separates a "small" and a "large" distance.Because information is evenly distributed across hypervector bits, and distance calculations are resilient to error, HD queries can complete successfully even when bit corruptions occur.
BSC HDC.This work targets the binary spatter code (BSC) HDC variant, which works with dense binary hypervectors.Random hypervectors are generated by sampling bits from a  = 0.5 Bernoulli distribution, and permutation, binding, and bundling operations are implemented with circular shifts, bit-wise exclusive-OR (XOR), and bit-wise majority operations, respectively.The Hamming distance  (ℎ,ℎ ′ ) = 1   ℎ  ∧ℎ ′  is the BSC distance measure, where  is the hypervector size.The bit-wise majority operation computes whether there are a majority of "1" or "0" bits at each bit position, and can be alternatively interpreted as an element-wise sum, followed by a thresholding operation.
Operators.The binding and bundling operations (ℎ ⊙ℎ ′ and ℎ +ℎ ′ ) respectively produce hypervectors that are dissimilar and similar to the input ℎ, ℎ ′ hypervectors.The permutation operation ℎ ′ =   (ℎ) produces a hypervector ℎ ′ that is dissimilar to the input hypervector ℎ, where the original hypervector can be recovered by inverting the permutation (ℎ =  − (ℎ ′ )), where  is an integer value.Generally, binding and permutation operations distribute over bundling, and for BSC HDC, bundling/bundling are commutative and associative, and binding is invertible.
Codebooks.These operations are performed over a "codebook" of basis hypervectors, which represent atomics in the problem domain.Examples of codebooks include letters of the alphabet (K=26), numerical digits (K=10), graph nodes (K=# nodes), and primary colors (K=3).Each basis hypervector, or code, in the codebook is typically randomly generated; the associated atomics (e.g., letters) are, therefore, dissimilar from one another.Conceptually, this dissimilarity captures the idea that atomics are distinct -the letter A is distinct from the letter C, for example.The HD permutation, binding, and bundling operations are then applied to encode information using these atomics.For example, bundling the "A" and "B" basis hypervectors (ℎ =ℎ  +ℎ  ) produces a basis hypervector similar to both A and B.

Data Structure/Query Interpretation of HD Computing
An HD computation can be thought of as a data structure encoding operation that produces a data structure hypervector ds that can be queried by computing its distance from a query hypervector q.Because hypervectors are lossy information encodings, a query against an HD data structure may occasionally return an incorrect result -the accuracy of a query is the probability that a query returns the correct result.The hypervector size and the distance thresholds together control the computation's accuracy.
Data Structures.The bundling operation conceptually creates a set of elements where the membership of an element or subset can be tested with a distance calculation.For example, for the ds=ℎ  +ℎ  +ℎ  ≈ {,,}, The membership of a subset q=ℎ  +ℎ  ≈ {,} is tested by computing the distance  (ℎ  +ℎ  ,ℎ  +ℎ  +ℎ  ).If the distance falls below a distance threshold ℎ , the set contains the subset; this is referred to as a match.A false positive occurs when a data structure hypervector falsely matches the query, and a false negative occurs when a data structure hypervector falsely fails to match the query.The binding operation conceptually creates a record of elements (ℎ  ⊙ℎ  ∼ ⟨,⟩), where each record is only similar to other matching records, and the permutation operation is used to encode positional information into the HD data structure.Several complex data structures that compose sets, sequences, and records can be built from these basic operations.For example, ℎ  + 1 (ℎ  ) builds the sequence [,] that can be indexed at index 1 by computing  −1 (ℎ  +  1 (ℎ  )), the ℎ  ⊙ ℎ  +ℎ  ⊙ ℎ  encoding builds the set of records {⟨,⟩,⟨,⟩} that can be queried with record subsets, and the ℎ  ⊙ 1 (ℎ  ) encoding builds a tuple ⟨,⟩ of ordered elements, such that ⟨,⟩ ≠ ⟨,⟩.
Item Memories.Complex data structures, such as graphs and databases, can be encoded through the use of an item memory, a key-hypervector data store that maps identifiers to hypervector item memory rows ℎ  , which implements HD data structures.For example, an HD graph item memory maps nodes to hypervectors, and each hypervector "value" encodes the set of edges connected to the associated "key" node.Item memory-based data structures support two queries: (1) threshold-based queries and (2) -winner winner-take-all queries.For both queries, the distance between the query hypervector and each item memory row  (,ℎ  ) is computed.In threshold-based querying, all item memory rows that match a query hypervector  are returned.In winner-take-all querying, the  item memory rows closest to the query hypervector are returned.Threshold queries require a distance ℎ to operate, which may be individually set for each item memory row, while winner-take-all queries take no additional parameters.See Section 10 for more discussion on these two query types.

ILLUSTRATIVE EXAMPLE: KNOWLEDGE GRAPH
Knowledge graphs capture networks of real-world entities (objects, people, situations), and model relationships between them.An outgoing edge indicates the originating node is acting on a target node, and an incoming edge indicates the receiving node is the target of another node.Nodes map to concepts (e.g., apple, mary, tennis), and edges are labeled with relations (e.g., plays, likes, hates).HD Knowledge Graph. Figure 1b presents the HD encoding of the student knowledge graph from Figure 1a (Lines 2-6).This encoding works with codebooks that specify the relations {likes, plays}, the interactions {act, target}, and concepts {jack, mary, banana, apple, tennis} that may appear in the student knowledge graph.Each node's edge information is then encoded as a hypervector ds[node] in item memory.A hypervector that encodes each incoming and outgoing edge is constructed by binding together interaction, relation, and concept tuples (e.g., act ⊙ likes ⊙ apples).The hypervector that encodes the set of edges connected to a given node is constructed by bundling (+) all of the edge hypervectors containing the target node together.In BSC, binding distributes over bundling.
Each edge set hypervector is then stored in an item memory.The keys in the knowledge graph's item memory map to concepts, and the hypervectors are the constructed edge list.In this example, the item memory hypervectors are stored in 2-bit-per-cell resistive RAM (ReRAM).This storage medium is 2x denser than conventional binary storage but has a 2.15% chance of corrupting a bit in memory -this error rate is collected from a real ReRAM array (Section 8).Therefore, the emerging memory may sporadically corrupt bits at random positions in the hypervector data structure.Queries.We now want to query the knowledge graph to answer the following question: "How many students like apples?".To answer this question, we would count how many nodes have outgoing edges with the likes relation label pointing to apple -this will be referred to as the apples query.
We want the apples query to complete with 99% accuracy, even in the presence of hardware error.This query can be dispatched on the hypervector representation of the student knowledge graph.We construct the apples query hypervector by binding together the relation, concept, and interaction hypervectors together (Line 8) -this is the same encoding used for the edge in the knowledge graph.We then determine if a node hypervector contains the query tuple by computing the Hamming distance dis(query,ds[node]) between the query hypervector and the node hypervectors in item memory (line 10).Node hypervectors with a distance below some node-specific distance threshold ℎ contain ⟨act,likes,apples⟩ query tuple in its edge set, and the corresponding key is returned as a match.This query cannot be expressed as a winner-take-all query as the number of matches is unknown.

Naive Query Optimization
We are interested in minimizing the hypervector size to reduce the memory usage while still attaining this 99% accuracy target.Typically, this is done by dynamically tuning the size and distance thresholds to execute the provided query with acceptable accuracy.Figure 2a presents a dynamic tuning algorithm that finds a minimum hypervector size between zero and nMax and the associated distance threshold that achieves a query accuracy of accReq over a test set of labelled matching/nonmatching HD queries and data structures (tests).The algorithm performs a binary search over hypervector sizes.For each size, the algorithm executes each test query and data structure for  ×  Monte Carlo trials to build up the dataset (data) of query-item memory distances.The algorithm then uses a brute-force search to find the distance threshold that maximizes the accuracy, or the fraction of correctly classified samples over the constructed dataset.In total, the dynamic tuning algorithm executes (nMax) ×|tests|×nCbs×nTraces Monte Carlo trials of the computation.
Accuracy.We parameterize the dynamic tuner with a test set containing the "apples" query and knowledge graph data structures, over 30 random codebooks, and 30 error traces -in total, 900 trials.The dynamic tuner completes in 35.4 seconds (averaged over ten runs) and finds a hypervector size of 117 bits and threshold of 0.402 that empirically delivers a query accuracy of 99.044% on the test set.With these dynamically tuned parameters, we attain an 85.47x reduction in the hypervector size, compared to the unoptimized 10,000-bit hypervector size used in previous literature -dynamic tuning therefore significantly reduces the memory footprint of the knowledge graph item memory.[Kanerva 2009[Kanerva , 2014[Kanerva , 2018;;Montagna et al. 2018;Rahimi et al. 2017] While this parameterization delivers the desired accuracy for the apples query, it does not generalize well to other knowledge graphs or queries of comparable complexity.To demonstrate this, we randomly construct 1000 knowledge graphs containing five nodes with a maximum degree of 4, generate five random edge queries for each graph, and then evaluate the accuracy of the edge queries over 900 trials.Over these 5000 randomly generated data structure-query combinations, we find that the accuracy for 0-degree nodes, 1-degree nodes, 2-degree nodes, 3-degree nodes, and 4-degree nodes are 100.0%,98.8%, 98.8%, 98.9%, 97.6% respectively.Notably, queries over 1-4 degree nodes fail to meet the 99% accuracy target when the dynamically tuned parametrization is used.This issue can be addressed by dynamically tuning over randomly generated queries and data structures; however, doing so will drastically increase parameter tuning time.

Optimizing the Apples Query with Heim
With Heim, we can statically optimize the size and distance thresholds for a given HD computation without performing any simulation.Heim delivers precise accuracy guarantees for data structure queries that generally hold, on expectation, over different query and data structure instantiations and can deliver these guarantees even in the presence of hardware error.
HD Specifications.Heim works with an specification of the HD computation that describes the accuracy requirements for a set of data structure queries.Figure 2b presents a Heim specification that verifies that all edge queries over 5-node knowledge graphs with a maximum degree of 4 achieve a query accuracy of at least 99%.The apples query on the student knowledge graph presented in Figure 1a is an example of a concrete data structure query that adheres to this specification.Line 2 defines a query as a product of interactions (itypes), relations (rels), and concepts (concepts), and Line 3 defines a node hypervector in item memory as a bundle (sum) of up to 4 edge hypervectors, where each edge tuple is binding of an interaction, relation, and concept.The accuracy assertion on Line 4 requires all tuple queries made to a node hypervector to return the correct result at least 99% of the time, with maximum false positive (incorrect match) and false negative (incorrect not match) rates of 1%.See supplementary materials for more information on the analysis-amenable knowledge graph data structure.
Heim considers the effect of hardware error during optimization and works with a hardware error model that captures the error rates in the target device.Figure 2c presents the hardware error model for the 2 BPC ReRAM-based accelerator we are targeting in this example; this model defines the per-bit corruption probability for data in item memory as 0.0215.All other operations are error-free.
Heim-Optimized Parameters.We use Heim to identify an optimal threshold and hypervector size for the specification in Figure 2b and the hardware error model in Figure 2c.Heim completes its analysis in 13.58 milliseconds (2606.8xfaster than dynamic tuning) and returns a hypervector size of 173 and distance thresholds of 0.4116, 0.3795, 0.3795, and 0.1744 for nodes with degrees 4, 3, 2, and 1.The Heim-optimized apples query attains an accuracy of 99.944%.Heim, therefore, meets the 99% accuracy target and delivers a 57.80x reduction in hypervector size over unoptimized 10,000 element vectors, reducing the number of 2 BPC ReRAM cells required to store each node hypervector from 5,000 cells to just 87 cells.The Heim hypervectors are 1.48x larger than the dynamically tuned hypervector size but much more reliably deliver the desired accuracy across data structures.In fact, Heim guarantees that the derived threshold and hypervector size will classify edge queries on node hypervectors (with node degree ≤ 4) with at least 99% accuracy on expectation.We evaluate the Heim-optimized HD computation over the random knowledge graphs and queries described in Section 3.1 and find Heim empirically attains an accuracy of 100.0%, 100.0%, 99.9%, 99.9%, 98.9% for 0-degree, 1-degree, 2-degree, 4-degree and 5-degree nodes respectively, all close to or higher than the target accuracy of 99%.Therefore, while Heim's hypervector size is larger than the dynamically tuned hypervector size, the Heim-tuned computation more reliably meets the 99% accuracy constraint.

HEIM SPECIFICATION LANGUAGES
The Heim accuracy specification language  enables practitioners to specify the structure of HD programs to optimize.The specification language supports the specification of abstract programs that capture a set of HD computations.The language supports describing HD computations with the following statements: HD Expression.Each abs-data v = Expr statement maps an HD expression  to a variable .The HD expression  statically describes the structure of the HD computation to analyze.We break up HD expressions into simple () and complex () HD expressions.A simple HD expression can be, a code, a permutation of a code, where the basic permutation operator perm specifies the number of times to apply or unapply the permutation (), or a tuple of (permutation of) codes.A complex HD expression is either bundle of simple expressions sum(, * ), or binding of several bundles of simple expressions prod(sum(, * ) * ).All sum expressions specify the maximum number of hypervectors that will be summed together () -this is necessary for Heim to complete its analysis.
Thresholding Query Accuracy Constraint.The thr-query( , ′ ,,, ′ , ′′ ) statement imposes the requirement that the thresholding query | ∩ ′ | ≥  produces an accurate result with a probability of at least .Intuitively, this formulation checks that thresholding on  (, ′ ) can determine whether at least k elements in the query  are contained within the data structure  ′ correctly with a probability of at least .The statement also defines the maximum probability of a false positive ( ′ ) and false negative ( ′′ ) occurrence.
Winner-take-all Query Accuracy Constraint.The wta-query( , ′ ,,,,,, ′ ) statement imposes the requirement that a WTA query produces an accurate result with a probability of at least .Specifically, the WTA query has a query in the form of  and  data structures in the form of  ′ in item memory, out of which only  match the query, satisfying that | ∩ ′ | ≥ .The WTA query returns the  in  data structures with the smallest distances to the query.We define the result as accurate when the returned  ones are exactly the  positives.The statement also specifies a softer constraint that the  true matches are contained within the top  lowest distances to the query ( ≥), with probability  ′ .

Hardware Error Model
Heim works with a hardware error model () that describes the error rates for the basic HD computational operators.The  = x statements define the per-bit error rates for the bundling (bundle), binding (bind), and permutation (perm) operators.The  = x defines the per-bit error rate associated with storing data in item memory, in codebook memory, or the query buffer.The item memory data storage location supports in-memory distance calculations, the query buffer stores the query to apply to item memory, and the codebook memory stores the basis vectors for the codebooks.

O O P S L A 2 3
Hardware-Aware Static Optimization of Hyperdimensional Computations 222:9 Formulation reference description WTA-,  = 1 (6) [Frady et al. 2018] WTA accuracy for exactly one winner  = 1 WTA- (10) Section 5.4 WTA accuracy for more than one winner  > 1 WTA- (12) Section 5.4 probability of the  positives being in top  .QDS I ( 14) [Kanerva et al. 1997] single-element sum-of-product set membership QDS II ( 15) [Kleyko et al. 2016] subset sum-of-product set membership QDS III ( 17 (b) WTA Query Fig. 4. Visualization of WTA/threshold query over match/no-match distance distributions.Points map to sampled match and no-match query-item memory row distances (• and •) for a 10-element item memory, where match/no-match distances are sampled from match/no-match distance distributions (■ and ■).Circled points (•) map to correct row matches for query.

HEIM ACCURACY ANALYSIS
At the heart of Heim is a novel static accuracy analysis that derives the query accuracy for thresholdbased and winner-take-all data structure queries.The analytically derived accuracy is both precise and sound -Heim's analysis guarantees that the query under study will converge to the computed accuracy on expectation.This accuracy analysis works with analytical models of the query-data structure distance distributions that are parametrized over hypervector size, the hardware error model, and the query and data structure expressions -these models are used to analytically derive the accuracy of each type of query.For the Heim analysis to be sound, Heim requires certain mutual independence constraints to hold over the query and dataset.Section 5.1-5.2overviews the relationship between distance distributions and query accuracy, Sections 5.3-5.4present the accuracy analysis, and Section 6 presents the analytical distance distribution models.Table 1 summarizes the novel and previously published theoretical results used in this analysis.
Heim Optimizer.The Heim accuracy analysis is used by the Heim optimizer to find the minimum hypervector size for a given Heim specification.The Heim optimizer returns a set of query-specific thresholds that can be used to more accurately query from the data structure, and a query-specific hypervector size that can be used to soundly do partial computation.Heim also returns a set of queryspecific mutual independence constraints that must hold for the above analysis to be fully sound; these constraints can be optionally checked at runtime with the Heim mutual independence checking algorithm.Section 7 presents the Heim optimization algorithm, and Section 7.1 presents the algorithm for checking that the mutual independence constraints hold over concrete data structures and queries.

Intuition: Accuracy of Threshold-Based Queries
Consider the following Heim accuracy constraint over a threshold query thr-query(q,ds,k,x,x',x"), where q is the query HD expression, ds is the HD expression of the rows in a data structure's item memory.Each item memory row is a match if it contains at least k elements in the query expression, and a  not match otherwise.Figure 4 presents the match (blue) and not-match (red) query-data structure distance distributions.Each distance distribution is normally distributed, with a mean  (,,,ℎ) that depends on the query and data structure expressions, the number of matching elements, the hardware error model, and a standard deviation  ( (,,),) is a function of the mean and hypervector size.
Example.In Figure 4, the points on the match and not-match distributions correspond to the querydata structure distances for a ten-element item memory with five match rows (blue, circled) and five not-match rows (red), given a sampled set of codebooks and error trace.Each point on the distance distribution corresponds to a distance between the query hypervector and a row hypervector in item memory.In threshold-based querying, points to the left of the distance threshold (grey line) are returned as a match, and points to the right are returned as not a match.The above example correctly returns four of five matches and incorrectly returns one not-matching distance as a match -corresponding to 1 false positive and one false negative.The accuracy of the threshold query corresponds to the probability of correctly classifying each row in the data structure's item memory.
In the above example, 8/10 item memory rows are correctly classified, yielding an 80% accuracy.
Expected Accuracy.The associated match and not-match distributions can be analyzed to compute the expected accuracy.The overlapping area between the match and not-match distributions is the probability that a query is erroneously misclassified.Because the distance distributions overlap, there is ambiguity on whether the query matches a given data structure row.The accuracy is, therefore, the probability that a distance is sampled from the non-overlapping regions of the match and not-match distributions.The false positive rate is the portion of the overlapping area left of the chosen threshold, and the false negative rate is the portion of the overlapping area to the right of the selected threshold.
The degree of separation between the two distributions depends on how the distributions' mean and standard deviations are parameterized.

Intuition: Accuracy of Winner-Take-All Queries
Consider a winner-take-all (WTA) query accuracy constraint wta-query(q,ds,k,w,m,x,t,x') with w winners for an m-row item memory.The q, ds, and k arguments correspond to the query and data structure expressions and the number of query elements that must be contained in an item memory row to be considered a match.In w-winner WTA queries, there are precisely w true matches sampled from the match distribution.We note (1) that within matches sampled from the match distribution, there is no order to sampled matches, and (2) WTA queries are not used to query not-matching elements.
Example. Figure 4b shows a 5-winner WTA query over a ten-element item memory.The above accuracy constraint requires all true winners  (circled) to be contained within the top five lowest distances (left of top w=5 line) with probability x, and all true winners  to be contained within the t=8 lowest distances (left of top t=8 line) with probability x'.Intuitively, the former requirement requires only rows that are true matches to be returned for a WTA query with probability x.The latter requirement is a soft requirement that ensures all true matches are contained within the top t lowest distances with probability x', where  ≥ .The above figure violates the hard constraint since the top 5 lowest distances contain one not-matching row, and satisfies the soft constraint since all true matches are in the top 8 lowest distances.
Expected Accuracy.Next, we provide an intuitive explanation of the expected accuracy.First, we draw  distances from the match distribution and − distances from the not-match distribution.
Intuitively, the expected accuracy of a w-winner WTA query corresponds to the probability that all w matching distances are to the left of all m-w not-matching distances.The probability the soft accuracy requirement is satisfied corresponds to the probability that the top t distances contain all w matching distances.The WTA distance distributions have the same  and  parameters as the threshold-based distance distributions.

Threshold-Based Query Accuracy Analysis (thrAccAnalysis)
Given a hypervector size, a hardware specificationℎ, and a threshold query constraint thr-query(q, ds, k, reqAcc, reqFp, reqFn), the analysis queries the analytical distance model M(,,,,ℎ) to retrieve the corresponding match and not-match distance distributions Φ  ,Φ   and the associated independence constraint  .Heim finds the optimal distance threshold ℎ  maximizing  for the given match (Φ  ) and not-match distributions (Φ   ) and upper bounds on false negative  and false positive   rates.The analysis returns the computed threshold, the independence constraint, and whether the derived threshold satisfies the provided accuracy requirements.
Optimal Threshold Derivation.For a ℎ , denoting   and   as the false positive and false negative rates, we have ( ( ) () is the cumulative distribution function of distribution  evaluated on ) and with the increase of ℎ ,   decreases and   increases.Therefore, requirements   ≤   and   ≤  can be translated into bounds on ℎ .
If ℎ  > ℎ ℎ , no ℎ satisfies the requirements and we set  to False.Otherwise, we aim to maximize the accuracy  = 1− 1 2 (  +  ) (assuming balanced positive and negative queries). max We take the derivative of  as follows ( ( ) () is the probability density function of distribution  evaluated on ) We have  ℎ > 0 when   <ℎ < , and  ℎ < 0 when  <ℎ <    , where  is the intersection of two PDF curves in range   <  <    (Figure 4).Solving  is trivial as  (Φ  ) () = (Φ   ) () is a quadratic equation of .Therefore, the optimal ℎ is the point closest to  in range [ℎ  ,ℎ ℎ ], i.e., ℎ  = max(ℎ  ,min(ℎ ℎ , ) ) (5) The analysis reports a satisfying threshold was found iff ℎ  achieves the accuracy requirement and returns the optimal threshold and the independence constraint  on success.

Winner-Take-All Query Accuracy Analysis (wtaAccAnalysis)
Given a hypervector size , a hardware specification ℎ, and a WTA query constraint, wta-query(q, ds, k, w, m, x, t, x'), the analysis queries the analytical distance model M(,,,,ℎ) to retrieve the corresponding match and not-match distance distributions Φ  ,Φ   and the associated independence constraint  .The algorithm computes the expected hard and soft accuracies ( and ) from the Φ  ,Φ   distance distributions and the , , and  WTA query parameters.The expected accuracy is then compared with the provided accuracy requirement ( ≥ ,  ≥  ′ ) to determine whether the hypervector size is sufficiently large.On success, the algorithm returns the independence constraint  .
Winner-Take-All Query Accuracy.We present how the hard accuracy () and soft accuracy () are computed in this section.We start with .We denote the WTA query accuracy as  (,,Φ  ,Φ   ).Frady et al. [Frady et al. 2018] developed a perception theory that gives the expected WTA accuracy when  = 1 as follows An intuitive explanation of the above equation is that when the one positive vector has distance  to  (with probability density  (Φ  ) ()), the result is accurate iff the other −1 distractor vectors  all have a distance greater than  (each independently with probability 1− (Φ   ) ()), and the result is the integral of it for all possible  over distribution Φ  .
Accuracy with Multiple Winners.We extend the theory to handle the general cases when  > 1.
Following the intuition of ( 6), if we have  ( ,Φ  ), the probability density function of the maximum of distances of  positive vectors, then because the results are accurate (i.e., the returned  vectors are all positives) iff the maximum distance of the  positives is no greater than the minimum of the − negatives.We then derive  ( ,Φ  ).First, we have because  ,Φ  ≤  iff all  positive distances are no greater than , each independently with probability  (Φ  ) ().Then, with the relation of  and  and the chain rule, we have With ( 9), we now conclude that Analysis of Soft Accuracy Constraint.Besides specifying the desired accuracy, Heim also enables users to specify soft accuracy constraint for WTA queries, with the following form: with probability  ′ the distances of the  positives are all in the top- smallest.In other words, there can be at most  − in the other− vectors in the codebook that have a distance smaller than any positive, with probability  ′ .To formulate this new constraint, we denote  (,,,Φ  ,Φ   ) as the probability that  positives all have at least top- smallest distances.To calculate  (,,,Φ  ,Φ   ), we enumerate the number of negatives that have a smaller distance than any positives (), which should be no more than  −.For each , and  ,Φ  =, the probability is that exactly  negatives have smaller distances than , each independently with probability  (Φ   ) (), and the other − − negatives have greater distances than , each independently with probability 1− (Φ   ) ().Therefore, we have Substituting  ( ,Φ  ) with (9), we have

ANALYTICAL MODEL (𝑀)
The Heim analytical model derives the match distribution parameters ⟨  ,  ⟩, the not-match distribution parameters ⟨   ,   ⟩ for the provided query, and the mutual independence constraint  that must hold for the analysis to be valid: ⟨  ,   , ⟩ =  (,,) ⟨  ,  ⟩ =  ( (ℎ,  ),) ⟨   ,   ⟩ =  ( (ℎ,   ),) Sections 6.2-6.6 describes how the match and not-match mean distances are derived from the query and data structure (), Section 6.7 describes how hardware error is incorporated into the error-free mean distance ( ), and Section 6.8 derives the standard deviation from the mean of the same distance distribution ().
Formalization of HD Computation.A code  is a randomly generated hypervector that maps to a distinct atomic symbol (e.g., a letter).A code set  is a vector that is the superposition (bundle, +) of a set of codes.We denote  ∈ if  is a code in the set, and  = { 1 , 2 ,...,  } if  is a superposition of codes  1 + 2 +...+  .We denote a code tuple  as a product (⊙) of two or more codes and create code tuples by binding codes, e.g.,  ⊙ ′ =.A code tuple set  is a set of code tuples.The analysis works with codes, code tuples, code sets, and code tuple sets.All permutation operators over codes   () are represented as distinct codes  ′ in our formalization.This transformation can be applied because the dependency of  and   () does not affect distance computation.

Mutual Independence
The Heim analytical model assumes that both the data structure and query contain mutually independent codes or tuples.In the analytical model, Heim identifies the mutual independence constraints that must hold.These mutual independence constraints are dynamically checked when constructing the data structure and query.The specific mutual independence constraint depends on the type of analytical procedure used to perform the analysis.We next present two types of mutual independence constraints.We discuss the implications of the mutual independence constraint in Section 6.10, present an efficient dynamic independence checker and prove the equivalence of mutual independence constraints to statistical independence in Section 7.1.
Independent Set We define independent sets as a set of codes or tuples that are mutually independent.Formally, given any HD expression, it can be flattened into the superposition (set) of code tuples  = { 1 , 2 ,...,  } ( =  =1   ). is an independent set if and only if for any   ,1 ≤  ≤ , there exists no subset  ′ ⊂ other than the subset {  } that contains only   , where   is the binding of tuples in  ′ (i.e.,   = ⊙  ∈ ′ ).For example,  = ( +) ⊙ + ⊙ =  ⊙ + ⊙ + ⊙ is not an independent set, because  ⊙ = ( ⊙) ⊙ ( ⊙).Recalling a tuple in an independent tuple set is similar to recalling a code from a simple code set (they fall into the same QDS type, see Section 6.4).Independent Product.A product of sets  = ⊙  =1   is called an independent product if and only if the multiplicant sets are disjoint and the sum of all the multiplicant sets  =1   is an independent set.For example, (+) ⊙ ( +) is an independent product because these two sets are disjoint and ,,, are mutually independent.( ⊙ + ⊙) ⊙ ( ⊙ + ⊙) is not an independent product because  ⊙, ⊙, ⊙, ⊙ are not mutually independent, even though the two multiplicant sets are both independent sets.Note that a product of sets can also be flattened into a set of tuples, and the flattened set is usually not independent, e.g., ( +) ⊙ ( +) = ⊙ + ⊙ + ⊙ + ⊙ is not an independent set because ( ⊙) ⊙ ( ⊙) ⊙ ( ⊙) = ⊙.

Query-Data Structure (QDS) Predicates
We introduce the concept of a query-data structure (QDS) predicate, a unifying formalization that enables Heim to implement an analysis that both leverages theoretical results from previous literature [Kleyko et al. 2022[Kleyko et al. , 2021]], and the novel derivations (Section 6.6).A query-data structure predicate is a set membership expression with the formulation |∩ | ≥  for which the symbolic match and not-match mean distances have been analytically derived.The Heim analysis supports three forms of QDS predicates (tuple  and tuple set  can also be simply code  and code set ):  III predicates test if a tuple is in an independent product.Each QDS type is associated with an independence constraint on the data structures.Type I QDS can be seen as a special case of type II QDS, but it is listed as a separate QDS as it is most frequently used.Note that these 3 QDS predicates systematically cover all cases when the data structure is independent, except for the subset query in the independent product.Subsets of independent products may have dependent tuples, which are theoretically challenging [Clarkson et al. 2023].This kind of queries are rarely used in data structures [Kleyko et al. 2022[Kleyko et al. , 2021]].We leave incorporating this kind of queries as future work.
QDS Classification.Given a query and data structure, Heim classifies the query into 3 supported QDS predicates by inspecting the data structure and the query expression form.If the data structure is in the form of product of sums (bound tuple sets), then it must be a type III query and the query must be a tuple.Otherwise, the data structure must be an independent code/tuple set, in the form of sum or products (bundle of code tuples).Then if the query is a code set or code tuple set, it must be a Type II query.If the query is a code or code tuple, then it must be a Type I query.
Mutual Independence Constraints.Depending on the QDS Type, Heim returns a mutual independence constraint that must hold over the data structure for the corresponding analysis to be valid.For sum-of-product formed data structure  , the independence constraints  = (expr) requires that  is an independent set.For product-of-sum formed data structure  , the independence constraint  = (expr) requires that  is an independent product.The definitions of independent set and product are in Section 6.1.

Most Common Not-Match Distribution -Independent Vectors
We start with the simplest and the most commonly used not-match distribution -the distance distribution between two independent/unrelated vectors  1 and  2 .We denote the mean distance between two vectors  1 and  2 as  ( 1 , 2 ) =  [ ( 1 , 2 )].In the following analysis, we assume all the code sets and code tuple sets are of odd size.The reason is that when bundling even number of vectors, the common practice is to add one more randomly generated vector as an operator to prevent the potential ties for majority [Kleyko et al. 2021].Because Hamming distance is the average distance across all vector dimensions, the mean distance between two vectors is then the probability of them to differ in any one dimension.Since no correlation exists between two independent vectors, each dimension of one is equally likely (with probability 0.5) to be 0/1 (same/different) from the perspective of the other.The mean distance is therefore:

Type III (Tuple-Set) Analytical Model
In this QDS type, the query vector is a code tuple  and the data-structure vector is a product-of-sum formed code tuple set  = ⊙  =1   , i.e.,  is a binding of several code tuple sets.We derive for this QDS type because binding code sets is a common operation used in constructing data structures, e.g., analogical database [Kanerva 2010], finite-state-automata [Pashchenko et al. 2020].Although  is also a tuple set (it can be flattened to sum-of-product form), this QDS type differs from type I in that the tuples in  have dependencies, while QDS type I assumes independence of tuples in the set.For example, consider  = ( 1 + 2 ) ⊙ ( 3 + 4 ) = 1 ⊙ 3 + 1 ⊙ 4 + 2 ⊙ 3 + 2 ⊙ 4 .The first tuple  1 ⊙ 3 is the binding of the other three.The dependencies make the distance distributions different.
Since this analysis requires that  = ⊙  =1   is an independent product (Section 6.1), enabling us to view tuples in ∪ 1≤ ≤   as independent codes, in the following analysis we assume  = ⊙  =1   , i.e.  is a product of code sets, and the results generalize to the  = ⊙  =1   case.We first consider the simple case where  = ⊙ ′ is the product of two code sets.Assume  = { 1 , 2 ,...,  } is of size  and The not-match case is |{ }∩ | < 1, i.e.,  ∉ , then the two independent vectors have mean distance (13).Otherwise, suppose  = 1 ⊙ ′ 1 , we derive that: The derivation is as follows.Since binding is commutative, we have: Therefore, for one dimension of  and  to differ, either in the dimension  1 and  are the same while  ′ 1 and  ′ differ, or the  ′ 1 and  ′ are the same while  1 and  differ.The probability of  1 and  to be the same in a dimension is 1 because it requires less than half (at most  −1 2 ) of  2 , 3 ,...,  to differ from  1 in the dimension, and the number of possible choices satisfying this is and there are 2  −1 choices for  2 , 3 ,...,  in total.For  1 and  to be different in a dimension, there has to be at most  .By symmetry, we also get the probability for  ′ 1 and  ′ to be the same or differ in one dimension.Combining them together gives (16).Note that the computation of ( 16) and ( 15) can be sped up by pre-computing the binomial coefficients and prefix sums of them with Pascal's triangle.More generally,  can be the binding of ,∀ ≥ 2 sets.Assume  = ⊙  =1   , and   = { 1 , 2 ,...,   } is a code set of size   .In this general case, we have: The derivation is similar to (16).By commutativity of binding, we have: Therefore, denoting   as the value of one dimension of  1 ⊙ { 1 , 2 ,...,   }) (0 or 1), for  ⊙ to be 1 in the dimension, odd number of   s should be 1, i.e.,  =1   is odd.The probability for each   to be 0 or 1 has been derived for ( 16).Adding the probability of all the independent cases gives (17).Note that for large , (17) is non-trivial to compute, as there are exponential number of cases where  =1   is odd.However, common HDC computations do not involve binding of more than 2 sets.We leave the problem of computing ( 17) more efficiently to future work.For Type III QDS queries, equation ( 17), or ( 16) when  = 2 computes the match mean distance (  ).Equation ( 13) computes the not-match mean distance (   ).The mutual independence constraint ( ) for this QDS query is  (⊙  =1 { 1 , 2 ,...,   }).

Hardware Error-Aware Mean Distance Model (𝐻𝑤𝐸𝑟𝑟 (ℎ𝑤,𝑀𝑒𝑎𝑛𝐷𝑖𝑠𝑡))
HDC is a suitable computing paradigm for emerging hardware platforms because it is highly resilient against noises in them [Halawani et al. 2021;Imani et al. 2017bImani et al. , 2019c;;Karunaratne et al. 2020;Poduval et al. 2021].Heim incorporates the noise present in hardware, which works simultaneously for all the distance distributions above.We consider the bit-flip error model, where the probability of each bit in hyper-vectors to flip is .Bit flips change the mean distance between two vectors -we denote  ′ ( 1 , 2 ) as the mean distance of two vectors considering possible bit flips: The derivation is as follows. ( 1 , 2 ) is the probability of two vectors to be the same in one dimension, and  ′ ( 1 , 2 ) is the probability considering bit flips.There are two cases where they are the same in one dimension with possible bit flips.First, they can be the same before possible bit flips with probability  ( 1 , 2 ), and then two vectors either both have a bit flip, or both have no bit flip in the dimension, with probability  2 + (1−) 2 .Second, they differ before possible bit flips with probability 1− ( 1 , 2 ), and a bit flip occurs only to one of the two vectors in this dimension, with probability 2 (1−).Hardware Errors Increase the Expected Distance Between Vectors.In all cases we consider,  ( 1 , 2 ) ≤ 1 2 .The maximum  ( 1 , 2 ) is 1 2 when  1 and  2 are unrelated, as shown in ( 13), and relatedness of vectors makes their mean distance smaller.We show that possible bit flips increase the mean distances between vectors.
Larger mean distance means closer to distribution of unrelated vector, implying loss of relation information encoded in the hypervectors.The implication is that hardware noise decreases the information resolution, which is intuitive.We note that information loss increases with  in a reasonable noise range, as in ( 19) 2 (1−) increases monotonically for 0 <  < 0.5.This enables us to use a upper bound of  in our analysis and deliver a sound accuracy guarantee.Bit-Flip Probability.We derive the bit flip error probability from the hardware specification ℎ.As a standard practice, raw bit flip error rate is commonly used to characterize hardware noises.[Grossi et al. 2019;Le et al. 2021;Li et al. 2016] Given the hypervector operators and memory locations  ∈ = {bind, bundle, codebook, item_mem, query} from the hardware specification, we denote the error of an operator as  (op).We can compute the  as follows: captures the probability of at least one bit flip happens in certain operator or memory location.Note that  is a probability upper bound of a bit flip occurs in query or data structure during distance calculations, and using  delivers sound accuracy analysis as the information loss increases with  in a reasonable range 0 <  < 0.5 (shown in ( 19)).

Correspondance between Mean Distance and Distance Distributions (𝑇𝑜𝑁𝑜𝑟𝑚𝑎𝑙)
Given a mean distance  ′ ( 1 , 2 ) considering bit flips and hypervector size , we can derive the standard deviation to get the corresponding normal distribution  (,).Denote  as the value of one dimension in  1 ⊙ 2 .Note that Since  is either 0 or 1, the following holds: Since all  dimensions in  1 ⊙ 2 are independent and symmetric, we have To sum up, the distance distribution is determined by the mean distance and hypervector size .

Discussion on Model Simplifications
Use of Normal Distributions.We next justify the use of a normal distributions to model match and not-match distances.Recall, the distance metric is the Hamming distance, essentially the average distance in all dimensions.Since all the dimensions are symmetric, the distance of each dimension follows the same distribution.Therefore, the distance is the average of many i.i.d.variables.Furthermore, the hypervectors are long in HD computation, meaning that the number of i.i.d.variables averaged is large.By the central limit theorem, the distance distributions can be well approximated by normal distributions.In fact, the Kolmogorov-Smirnov difference (supremum of absolute distance) of the cumulative distribution functions (CDF) of the binomial distribution and its corresponding normal distribution is bounded by Ω( − 1 2 ) [Nagaev and Chebotarev 2011].Approximation with normal distributions is also a standard practice by theoreticians in this field [Frady et al. 2018], and we note that using binomial distributions to model hypervector distances is computationally expensive (costly to compute PDF and CDF, compared with normal distributions).Therefore, we view the distance distributions as normally distributed with a standard deviation and mean that both depend on the hypervector dimension, query and data structure sizes, and bit error probability.
Elimination of Permutation Operations.Because each bit of a code  is independently randomly generated, each bit of  and the corresponding bit of   () are independent, and thus the distance distribution between  and permutations of it   () are exactly the same as that of two independently generated codes, as in ( 13), unless  is equal to  (number of dimensions) or multiple of  times.Therefore, we may treat permuted codes as an independent code of the original codes.assert(success, "failed to find size that satisfies query accuracy requirements") 13: return ⟨optDim,queryParams⟩ Fig. 5. Heim accuracy analysis 6.10 Discussion on Mutual Independence Heim requires the HDC data structure and query to satisfy mutual independence constraints for the analysis to hold.We show that hypervectors that satisfy mutual independence can represent set, knowledge graph, analogical database, and NFA (see supplementary materials).Besides, a number of data structures, including stacks, sequences, and 2D images can be also be represented.These data structures are useful for signal and language classification, information retrieval, workload balancing, and analogical reasoning workloads.[Kleyko et al. 2021] In data structures with correlated information, independence can be induced by partitioning the data structure across multiple hypervectors, where each hypervector encodes mutually independent elements.Because hypervector size linearly increases with the number of stored elements (Figure 9a), the independent sub-hypervectors take up almost the same amount of space.We note this technique does not work well for applications where information loss induced by the bundling operation is a feature, such as feature encoding for machine learning applications, and cannot be used on queries with correlated elements.
Correlated Data Structures.The analysis of models with correlation is known to be hard and is an open problem in the HDC community.[Clarkson et al. 2023] This work establishes a core HDC analysis that is precise and sound.In the future, the analysis can be extended to directly support data structures that have correlations -these extensions would likely need to use overapproximations or use empirically derived information, and will not deliver the same guarantees as Heim's core analysis.
Example.It is possible to implement data structures with correlations and satisfy Heim's mutual independence constraints.Consider the edge set { ⊙, ⊙, ⊙, ⊙, ⊙ } for a 4-node graph.If we encode this set as one vector  = ⊙ + ⊙ + ⊙ + ⊙ + ⊙, the expected distance of vectors of  and  ⊙ is 1 2 (obtained by enumerating all 2 4 value combinations), totally indistinguishable from the distance of two independent vectors, although ⊙ is a member of the set.In this case, no threshold can have a > 50% accuracy (random guessing).However, one can decompose a dependent set into a number of independent sets, each stored in one vector.Therefore, instead of storing the graph as a set of all edges, we can store the set of incident edges of each node as one vector, similar to adjacency lists.This way, each vector is a mutually independent set, and a query falls into the QDS I predicate (Section 6.4).

HEIM OPTIMIZATION FRAMEWORK
Figure 5 presents the Heim optimization algorithm.The optimizer takes a hardware error specification (hwModel), a Heim specification (heimSpec), and a maximum hypervector size (maxN) as input, and returns both the smallest hypervector size that satisfies all accuracy constraints (optDim), query-optimized collection of hypervector sizes, independence constraints, and distance thresholds (queryParams) (line 13).Heim iterates over each query accuracy constraint in the Heim specification, and derives the minimum hypervector size minDim required for the query, the mutual independence constraints indepCstr that must hold for the analysis to be sound, and a set of distance thresholds to use for threshold-based queries thr (lines 6-7, 9-10).The wtaAccAnalysis.binsearch and thrAccAnalysis.binsearchroutines derive the minimum hypervector size minDim and associated thresholds and independence constraints for the given query by performing a binary search over hypervector sizes 0..maxN and invoking the appropriate accuracy analysis for each candidate size.The final hypervector size returned by is the maximum required dimension across all queries.The Heim returns early with an error if the accuracy analysis fails to find an appropriate size for any one query.

Dynamic Independence Checker
Heim offers an optional dynamic checking algorithm that validates the concrete data structure and query hypervectors that meet all independence constraints associated with the selected query.The independence checker helps users build data structures that satisfy Heim's independence requirements and can be disabled if the user is sure these requirements are met.The algorithm tests if a concrete HD expression contains mutually independent tuples or codes.Intuitively, a set of elements is mutually independent if the existence of each set element does not depend on the existence of other set elements.
Algorithm.We present an efficient dynamic checking algorithm for validating the mutual independence of products and sets of tuples.Since the definition of product independence is derived from set independence, an efficient set independence checker can also check product independence.To avoid checking independence with a time-intensive exhaustive search, we derive an equivalent condition that can be efficiently checked to ensure independence.Given a tuple set  =  =1   , denote  1 , 2 ,•••,  as the codes that are a factor of some   .For each tuple   ,1 ≤  ≤  in  , we create a binary vector   of length , where the -th element is 1 if   is a factor of   , and 0 otherwise.The equivalent condition of independence of set  =  =1   is that  1 , 2 ,•••,  are linearly independent in  (2).In other words, if we make a × matrix , where the -th row is   , the equivalent condition is that the  has rank  in  (2).For example, for  = ⊙ + ⊙ + ⊙, if  1 , 2 , 3 are ,, respectively, the vectors for  ⊙, ⊙, ⊙ are [1,0,1],[0,1,1],[1,1,0] respectively.These 3 vectors are not linearly independent because [1,0,1] + [0,1,1] = [1,1,0] in  (2), so  is not an independent set.Intuitively, this is because the binary vector addition in  (2) represents the binding of tuples, e.g., 1+1 = 0 in the previous example's last vector element is because binding  ⊙ and  ⊙ cancels out . Calculating the rank of a × matrix can be done with a  ( 2 ) or  ( 2 ) Gaussian elimination algorithm.
Correctness.A binding of tuples corresponds to the sum of their s in  (2).And for an index set  ⊂ {1,2,...,},  = ⊙  ∈   is equivalent to a linear equation  +  ∈   = 0 in (2).Therefore, when is an independent set, there exist no such linear equations, which is equivalent to linear independence of s.
Connection to Statistical Independence.From the formalism of the independence checker, we can derive that  is an independent set that is equivalent to that the bit values of   ∈ are statistically mutually independent.We consider the value of one dimension for all the vectors, and all the other dimensions follow the same arguments.Denote the values of the code vectors as x, i.e., x  is the value of   's vector.x can be any of the 2  values, each with probability 2 − , as codes are independently randomly generated.For an assignment of the tuple vectors y, we have y = x.There are 2  possible ys.For each given assignment y, the probability mass of it is 2 − times the number of solutions x of equation x = y.Since  is rank, it is guaranteed that there is at least one solution, and after Gaussian elimination, the Echelon form has − free variables in x, each of which can be either 0 or 1.This means that there are exactly 2 − solutions of x, and thus the probability mass of this assignment y is 2 − •2 − = 2 − .Thus, the joint distribution of the vector bit values of   is a uniform distribution over all possible 2  values (each bit is 0 or 1, with equal 50% probability).This is exactly the same as the joint distribution of  randomly generated codes, so we can analyze them as if they were independent codes.

Discussion
Use of Binary Search.Heim's optimization algorithm exploits the fact that the accuracy of the HD computation increases monotonically with hypervector size to parametrize the HD computation efficiently.This has been shown theoretically [Frady et al. 2018;Gallant and Okaywe 2013;Kleyko et al. 2022], and we also verify it in our evaluation (Figures 9a-9d).Because the accuracy is monotonic with respect to hypervector size, we can perform a binary search over hypervector sizes to identify the smallest hypervector size that satisfies a minimum accuracy requirement.
Metadata for Independence Checker.Many usage patterns involve building a data structure once and then querying the data structure hypervector.Once the data structure is checked for independence and built as hypervectors, no independence metadata about the data structure needs to be stored.Only the subset queries must be checked for independence when the data structure is queried.This can be done with the algorithm we described.We note it may be possible to directly embed this check in the encoding computation, which could be more lightweight.

EVALUATION ON ERROR-FREE HARDWARE
We evaluate Heim on five analysis-amenable HDC-based data structures over five different data structure complexities ( or (,)) -Table 2 summarizes the complexity of the randomly generated data structures and queries at each size.1For example, the set-100 benchmark has complexity  = 100, and is evaluated over random sets containing 50-100 elements and single element queries over the set.Each Heim data structure parametrization is evaluated over 100 randomly generated data structures and 20 randomly generated match/not-match queries, where half the queries evaluate to "match".The accuracy of each data structure-query computation is evaluated over ten randomly sampled codebooks.All baseline and Heim executions are evaluated over the same randomly sampled data structures, queries, and codebooks to reduce the effect of variance on the evaluation.
Query Accuracy Metric.Given P matching query executions and N not-matching query executions that produce TP true positive and TN true negative results, the accuracy of each benchmark is defined as one minus the average of the true positive and true negative rates ( 1 2 (    +    )).We employ a balanced strategy where false positive and negative rates are equally important since the real distributions of positive and negative queries depend on the target applications.Heim supports unbalanced false positive and false negative rates, so unbalanced query distributions can also be handled.We also report the accuracy ratio (rat) for benchmark applications, corresponding to the percentage of random data structure instantiations satisfying the target accuracy.
Baselines.We evaluate Heim-optimized hypervector size and threshold parametrizations against dynamic tuning-based baselines.These comparisons isolate the accuracy and performance benefits delivered by Heim over traditional parameter tuning approaches.Each parameterization is optimized to deliver a target query accuracy of 99%, all dynamic tuning baselines use the dynamic tuning algorithm from Figure 2a, and all described baselines have error injection disabled: ▶ Heim: Heim is used to statically optimize the distance thresholds and hypervector size for each benchmark with Heim to get 99% accuracy.For queries composed of sets of elements (db-match, nfa), Heim computes distance thresholds for different query set sizes.▶ dt-par: The hypervector size is fixed at 10,000 bits and a single distance threshold is dynamically tuned to attain 99% query accuracy.WTA queries accept no settable query parameters and, therefore, do not have dt-par executions.▶ dt-all: The hypervector size is dynamically tuned to find the smallest size between 1-100,000 bits that attains a query accuracy of 99%.For each candidate size, a single distance threshold is dynamically tuned to maximize accuracy.The search saturates at 100,000 bits.▶ dt-hybrid: The hypervector size is dynamically tuned similarly to dt-all, but Heim is used to find the theoretically optimal distance thresholds for each candidate hypervector size.This baseline isolates the effect of using Heim-derived thresholds for hypervector queries.

Query Accuracy Comparison
Figure 6 compares the query accuracy of Heim-optimized programs against the baseline executions.
The plot charts the median (timeseries), Q1, and Q3 (vertical bar) for each execution, and the query accuracies that violate the 99% accuracy requirement are shaded grey.Heim achieves 99.2%-100.0%median accuracy across all benchmarks -the required accuracy target of 99% is therefore met on expectation.Qualitatively, Heim-optimized executions have low variance in accuracy (vertical bars) across trials and generally deliver consistent accuracy across different benchmark sizes compared to the dynamically tuned and statically sized baselines.Therefore, Heim-optimized data structures reliably satisfy the desired query accuracy targets and consistently deliver the expected accuracy.We note that 80% of the executed trials exceed the 99% accuracy target (80% rat) across all benchmarks -we do not observe adherence to the accuracy constraint 100% of the time because Heim's guarantees hold on expectation.In contrast, the dynamically tuned (dt-all) baseline delivers 54.5%-99.5% median query accuracy, where only four benchmark evaluations (4 points) have median accuracies that meet the accuracy target of 99%.The dt-all benchmark evaluations also experience more significant fluctuations in query accuracy (vertical bars) than Heim-optimized HD computations and experience degradations in accuracy as the benchmark size increases, likely because the empirically derived parametrizations do not generalize well, especially as the queries and data structures grow in complexity.Statically fixing the hypervector size to 10,000 bits and dynamically tuning only the hypervector threshold (dt-par) attains higher median accuracies than full dynamic tuning and Heim when the data structure is small.For 3 of 5 benchmarks, dt-par achieves at least 99% median accuracy for the smaller 1-2 benchmark executions.The median accuracy of dt-par evaluations substantially degrades as the size of the data structures increases -over all executions, dt-par optimized programs attain a 50.5%-100.0%median accuracy.This phenomenon occurs because threshold-only tuning cannot expand the hypervector size to accommodate hypervectors that implement larger data structures and encode more information.
Hybrid Optimization.We also evaluate the query accuracy of a hybrid optimization approach (dt-hybrid) that uses Heim to find thresholds, given dynamically tuned query size.For 24 of the 25 benchmark executions, dt-hybrid executions attain median accuracies that are 0.1%-45.5% higher than dt-all. 2Notably, the dt-hybrid executions attain substantially better accuracy than dt-all on the nfa and db-match benchmarks, which supports the claim that specializing the distance threshold to the query size is important for queries containing multiple elements.We observe the db-match and nfa benchmarks use queries containing multiple elements and, therefore, likely work best when the distance thresholds are selected based on the query size.Because the dynamic tuning baselines only tune one threshold, the threshold may not work well across different query sizes.While it is possible to tune multiple thresholds dynamically, this would be prohibitively expensive to do with dynamic tuning.

Hypervector Size Comparison
Figure 7 compares the hypervector sizes of Heim-optimized executions against dynamically tuned (dt-all) and statically sized (dt-par) executions.The red-shaded cells fail to meet the 99% median accuracy requirement, and the grey shaded cells meet the 99% accuracy requirement less than 80% of the time.For the 4 of 25 dt-all benchmark executions which achieve 99% median accuracy, dt-all finds 1.27-1.52xsmaller hypervector sizes than Heim for three executions, and a 1.15x larger hypervector size than Heim for one execution.Of the benchmarks that do not meet the accuracy requirement, 9 of the 25 executions max out at the largest hypervector size.Therefore, dynamic tuning (dt-all) typically produces smaller hypervectors than Heim when it can find a parametrization that meets the desired accuracy target, as Heim employs a conservative strategy, but dt-all is rarely able to find a parametrization that reliably delivers a 99% query accuracy on average.In contrast, Heim chooses larger hypervectors than dynamic tuning but reliably meets the accuracy target on expectation with low variance in all cases.In cases where dynamic tuning cannot meet the target query accuracy (size=100k), the dynamic tuning algorithm selects hypervector sizes 4.55-25x larger than Heim.Heim can analytically derive smaller hypervectors by searching over a larger parameter space that includes thresholds and sizes tailored to specific queries and data structures.
For the 10 of 25 dt-par benchmark executions which achieve 99% median accuracy, Heim finds hypervectors that are 1.47x-7.14xsmaller than 10k bits for 7 of 10 executions, and finds hypervectors that are 1.02x-1.42xlarger than 10k bits for 3 of 10 executions, where 10k is the statically configured hypervector size.We also find dt-par's accuracy degrades for larger benchmarks and selects unnecessarily large hypervectors for small benchmark executions.These issues arise because cdt-par cannot flexibly adjust the size to accommodate larger or smaller benchmark executions.Therefore, Heim more consistently meets the target accuracy requirement than static allocation strategies while also delivering space savings for executions that can execute with smaller numbers of bits.

Optimization Time Comparison
Figure 8 compares the optimization runtimes for Heim against the dynamic tuning baselines.Heim completes its analysis in 85-3210 milliseconds, while dt-all takes 40 seconds to 22.72 hours to optimize the hypervector size and threshold.Fixing the hypervector size and dynamically tuning only the threshold (cdt-par) is substantially faster than full dynamic tuning, taking between 3.9 seconds and 168.8 seconds to compile.The Heim optimizer is 303.0x-100167.4xfaster than full dynamic tuning (dt-all) and 30.0x-874.4x faster than threshold-only dynamic tuning (dt-par), and generally scales better as the benchmark size increases.Because both dynamic tuning approaches are simulationbased, execution time scales poorly as the number of parameters to tune and the complexity of the data structure increases.In contrast, Heim's static analysis procedure is model-based and computes the optimal threshold and hypervector size in constant time.The performance of the parameter derivation algorithm is insensitive to the size of the optimized data structure, enabling scalable analysis.

EVALUATION OF HEIM ON EMERGING HARDWARE TECHNOLOGIES
Heim's accuracy analysis enables sound optimization of HD computations to execute with acceptable accuracy in the presence of hardware error.This capability enables the optimization of HD computations for error-prone emerging hardware technologies.We use Heim to systematically study the benefits and drawbacks of using different emerging technologies for hyper-dimensional computation.We analyze the performance benefits offered by analog content-addressable memories (CAMs) (Section 9.2), and the storage benefits offered by analog multi-bit storage arrays.(Section 9.1).

Heim Storage Density Analysis with MLC ReRAMs
We use Heim to analyze the storage benefits of using multiple-bit-per-cell (MLC, BPC) ReRAM-based item memories for HD computation against conventional one-bit-per-cell DRAM-based memory.
We use Heim to minimize the hypervector size while delivering a target query accuracy of 99% for 2 BPC ReRAM, 3-BPC ReRAM, and conventional DRAM.
MLC ReRAMs.ReRAM is an emerging resistive memory technology that is prone to bit corruption but delivers fast access times, non-volatility, and improved density.We investigate the benefits of 2BPC and 3BPC ReRAM, which have raw bit error rates of

Heim Performance Analysis with Analog CAMs
We evaluate the benchmark applications on embedded (micro), multicore (multi), and emerging hardware (cam) platforms.Each architecture is simulated with the x86 gem5 simulator using the processor and cache hierarchy presented in Table 3. [Binkert et al. 2011] The embedded and multicore baselines are based on the Intel Atom x7211E and Intel Core i9-10900E x86 architectures, respectively, and the CAM hardware platform uses the embedded architecture coupled with an analog CAM that efficiently performs item memory lookups to implement HD computations, but introduces error into the computation.To ensure an iso-accuracy comparison, we use Heim to optimize each execution to achieve a 99% accuracy on the respective hardware platform. 3The benchmarks are parallelized and implemented in C, where the query hypervector is built in parallel, and the item memory/query distances are computed in parallel.For the CAM hardware platform, the software instead computes the item memory/query distances by dispatching a single query to the CAM.
The Analog CAM.The cam hardware platform has the same baseline characteristics as the embedded hardware platform but also offers a ReRAM-based Analog CAM that uses Ohm's and Kirchhoff's laws to perform in-memory, parallel hamming distance calculations against a query bit vector.[Imani et al. 2017b] Because the analog CAM both performs analog computation and uses an emerging device technology, the associated hamming distance calculations are unreliable and complete with a bit error rate of 0.14%.The analog CAM is extremely fast and completes the entire item memory query in 2.74-11.90nanoseconds, provided 6-100 item memory rows are in use.The associated latency increases with row width, and the bit error rate increases with row width and device density.We parametrize the CAM to use 10k-bit rows, where > 10k-bit hypervectors are split across rows; these row distances are summed in the analog domain with Kirchhoff's law.The communication costs between the embedded system and the CAM are modeled as a DRAM access, and the threshold query latency 3 for db-analogy simulation, we set the item memory and codebook sizes to be 1 4 of the actual size to avoid memory issues with the gem5-x86 simulator.is conservatively approximated with the WTA lookup latency.We build performance and hardware error models by fitting regressions to the figures presented in the associated paper.[Imani et al. 2017b] Analysis Figure 10 presents the simulated runtime of a single query as a function of the benchmark size, averaged over 20 executions.On average, the multi, micro, and cam executions take 0.589-5.418,0.385-283.226,and 0.000-4.526milliseconds respectively, depending on the benchmark.We observe that, unsurprisingly, the multi executions complete much faster than the micro executions for large benchmarks and only execute slightly slower on small benchmarks.The multicore platform has 5x more cores at its disposal and operates at 2.8x the frequency of the embedded system, so it, therefore, can complete HD computations much faster, provided the synchronization overheads amortize.However, once the embedded platform is paired with a CAM, the embedded platform's performance becomes competitive and, in some cases, better than the multicore system.The cam executions are 2.16x-389.51xfaster than micro executions for all benchmarks and 1.35x-62.55xfaster than multi executions for 21 out of 25 benchmarks.The cam delivers substantial performance improvements because it significantly accelerates the item-memory search, which is usually the bottleneck when the query encoding is simple.We observe that these performance benefits hold, even though Heim optimizes the CAM executions to use 1.011x-1.012xlarger hypervectors to ensure iso-accuracy in the presence of hardware error.The multi executions are faster than cam in the largest 3 db-match benchmarks because the query encoding uses bundling operation and, therefore, becomes the bottleneck.
Discussion While HD computing is, at a glance, more resource-inefficient than classical computation for this class of applications, the HD computing paradigm enables the use of emerging hardware technologies that drastically accelerate computation and enable dense storage, such as CAMs and MLC ReRAM.These emerging technologies only implement highly restrictive subsets of computational operators and are error-prone, which can often lead to unpredictable effects on classical computations.Moreover, because the HDC model uses a distributed information representation, it is also amenable to several program optimizations that do not typically apply to classical programs.For example, the Heim per-query hypervector sizes and thresholds can be used to compute hypervector distances over smaller sets of bits soundly or to terminate distance calculations early when a match or not-match is guaranteed.Moreover, the binding, bundling, and permutation operation implementations and hypervector sizing can be altered to improve performance, provided the distance relationships between input and output vectors hold.Second, because the encoded information is evenly distributed, algorithms can use highly unusual program transformations.For example, HD data structures can be combined by splicing hypervectors together, distances can be computed over any segment of hypervector bits in any order, queries over data structure hypervectors can be fielded in the middle of a data structure update, and HD queries can be prematurely interrupted to receive a computational result.
Theoretical Analysis of HD Computation.Kanerva derived the distance distributions for the set-recall [Kanerva et al. 1997], and Kleyko derived distance distributions for subset-recall [Kleyko et al. 2016].We use Kanerva's and Kleyko's theoretical results to develop the Type I and Type II QDS analysis employed by Heim.Researchers have also studied theoretical capacity and recall accuracy of winner-take-all queries over VSA item memories [Frady et al. 2018;Gallant and Okaywe 2013;Kleyko et al. 2023a,c;Plate 1994;Thomas et al. 2021].We extend the perception theory [Frady et al. 2018] to develop Heim's WTA accuracy analysis.
Horizontal Thresholding.Kleyko et al. derived an approach for horizontally thresholding distance distributions for subset-set recall queries.[Kleyko et al. 2016] This thresholding does not apply to Type III QDS queries, discards distance sub-ranges, and does not estimate query accuracy or solve for the optimal distance threshold.In contrast, Heim analyzes a broader set of data structure queries (Type I-Type 3 QDS), and derives an optimal distance threshold to use.Our approach also uses the entire range of distance values when evaluating a query.
Hypervector Size Minimization for Classification.Prior work has primarily focused on dynamic approaches toward hypervector parameter optimization.Imani et al. dynamically tuned hypervector size to explore the trade-off between computational efficiency and accuracy.[Imani et al. 2018] Morris et al. decomposed computational hypervectors into lower-dimensional vectors of a dynamically selected size.[Morris et al. 2019] Basaklar et al. reduced the hypervector size by tuning the level hypervector construction for classification tasks.[Basaklar et al. 2021] Other works have tuned application-specific hypervector parameters, such as level hypervector chunk sizes, to reduce resource usage while maintaining classification accuracy.[Imani et al. 2019c] All approaches mentioned above are heuristic techniques that leverage dynamic tuning over representative inputs and therefore do not offer static guarantees.In contrast, Heim employs statically sound analytical methods that deliver accuracy guarantees, even in the presence of hardware error.

CONCLUSION
We presented Heim, a framework for statically optimizing HD computation parameters to minimize resource usage in the presence of hardware error.Heim produced parametrizations that generalize across queries and data structures and satisfy a target accuracy on expectation.Heim improved on dynamic parameter tuning-based approaches that potentially overfit to test data, provide no guarantees, and take orders of magnitude more time to find parametrizations.We demonstrated that Heim's analysis results could be leveraged to perform aggressive space-saving optimizations without compromising result fidelity and to systematically analyze emerging technologies' benefits and drawbacks while maintaining iso-accuracy.With analysis and programming systems such as Heim, we can enable the development of principled program optimizations that effectively reduce the resource requirements of HD computations without compromising accuracy.
Fig. 8. Y-axis is measured runtime (s) for tuning the hypervector size and threshold, averaged over 10 runs on a single-core machine.The ■ is Heim, ■ is dt-all, ■ is dt-par.The dt-par trendline is omitted in the db-analogy figure, because the benchmark uses a WTA query and does not require a threshold.
Fig. 10.Y-axis is simulated runtime (ms) of a single query with gem5 averaged over 20 runs.The ■ is micro, ■ is multi, ■ is cam.

Table 1 .
Summary of Theoretical Formulations distance(query,DS) Probability Density threshold (a) Threshold Query distance(query,DS) Single Element, Independent Tuple-Set [Section 6.4] | ∩ ′ | ≥  Type II, Subset, Independent Tuple-Set [Section 6.5] |{ }∩⊙    | ≥ 1 Type III, Single Element, Independent Product [Section 6.6]TypeI QDS predicates test if a code/tuple is in an independent code/tuple set, type II QDS predicates test if  elements of a code/tuple set is a subset of an independent code/tuple set, and type Proc.ACM Program.Lang., Vol. 7, No. OOPSLA2, Article 222.Publication date: October 2023.

Table 2 .
Summary of randomly generated data structure and query characteristics as a function of data structure size ( or (,)).Non-standard queries are described in grey.WTA queries only support matches.select rows , where ⟨,⟩ ∈ , ⟨, ′ ⟩ ∈ , infer  from item memory and  ′ Heim size to dt-all and dt-par size.dt-par uses hypervector size 10000 in all benchmarks.■ and ■ cells have a rat of less than 50% and 80% respectively.
[Hsieh et al. 2019;Le et al. 2021;Wei et al. 2023easurements were collected by characterizing a ReRAM storage array fabricated 130nm logic CMOS process in the BEOL with ECC disabled.[Hsiehetal. 2019;Le et al. 2021;Wei et al. 2023] Though the overall hypervector size increases for 2 BPC memory, the 2x improvement in data density for this memory technology subsumes this size increase.Heim produces 270-27361 bit hypervectors that use 90-9121 memory cells for 3 BPC ReRAM.Though 3 BPC ReRAM is denser than 2 BPC ReRAM, it does not net density improvements because the benchmark HD computations require substantially larger hypervectors to execute accurately in the presence of hardware error.