Cabin: A Compressed Adaptive Binned Scan Index

Scan is a fundamental operation widely used in main-memory analytical database systems. To accelerate scans, previous studies build either record-order or sort-order structures known as scan indices. While achieving good performance, scan indices often incur significant space overhead, limiting their use in main-memory databases. For example, the most recent and best performing scan index, BinDex, consists of a sort-order position array, which is an array of rowIDs in the value order, and a set of record-order bit vectors, representing records in pre-defined value intervals. The structures can be much larger than the base data column size. In this paper, we propose a novel scan index, Cabin, that exploits the following three techniques for better time-space tradeoff. 1) filter sketches that represent every 2^w-2 value intervals with a w-bit sketched vector, thereby exponentially reducing the space for the bit vectors; 2) selective position array that removes the rowID array for a fraction of intervals in order to lower the space overhead for the position array; and 3) data-aware intervals that judiciously select interval boundaries based on the data characteristics to better support popular values in skewed data distributions or categorical attributes. Experimental results show that compared with state-of-the-art scan solutions, Cabin achieves better time-space tradeoff, and attains 1.70 -- 4.48x improvement for average scan performance given the same space budget.


INTRODUCTION
Scan is a fundamental operation widely used in analytical data processing [9, 13, 16-18, 23, 25, 26, 30, 31, 34, 35, 37].Given a base data column in a table and a filter predicate, a scan operation returns a result bit vector or a rowID list that indicates which records satisfy the predicate.A plain scan reads the data column sequentially and evaluates the predicate for each data value.There are two main ways to improve the plain scan in the literature.The first approach is to optimize the layout of the base data column so that the scan can effectively exploit SIMD to speed up computation, and reduce the amount of data accessed with early pruning techniques [13,26].The second approach does not change the base data.Instead, it builds auxiliary data structures (a.k.a.scan indices) to accelerate scans [11,16,25,27,29].Compared to the first approach, scan indices have been shown to achieve higher performance.However, scan indices often incur significant space overhead, limiting their use in main-memory databases.In this paper, we aim to improve the time-space tradeoff for scan indices.

57:2 Yiyuan Chen and Shimin Chen
We consider the design principles for scan indices.A scan deals with two data orders: the record order and the sort order.Both the base data and the scan result are in the record order.On the other hand, for a predicate (e.g., <, >, ≤, ≥, =, ≠, or BETWEEN) that compares the data value with constant(s), the evaluation can be better supported if the data is sorted.The two orders are the same if values are sorted in the base data column.In this case, a search on the data column can efficiently compute the result bit vector or rowID list.However, the more common and challenging case is where values are not sorted.This is the focus of previous scan index studies, as well as the focus of our work.
To improve scans, scan indices build either record-order structures or sort-order structures or both.Record-order structures can reduce the amount of access to the base data, while sort-order structures can accelerate the computation of the predicate.
• B+-Tree [11] is a sort-order scan index.A scan performs a search, then follows the linked list of leaf nodes to retrieve all rowIDs that satisfy the predicate.This process is efficient if the predicate selectivity is low.However, for medium to high selectivities, it incurs many expensive random memory accesses.• Bitmap Index [29] maintains record-order bit vectors for each distinct values.It can retrieve a stored bit vector for a scan with an equality predicate.However, for other predicate types, it has to merge multiple bit vectors, and thus becomes less efficient if more distinct values satisfy the scan predicate.• Column Sketches [16] create a sketched column in the record order.Value ranges are mapped to small (e.g., 8-bit) code words.Then the sketched column stores a code per record.A scan evaluates the predicate by checking the per-record code in the sketched column.This is sufficient for most records.For a small fraction (e.g., 1 256 ) of records, the check is uncertain and the scan reads the base data values.In this way, Column Sketches significantly reduce the amount of data to read.
• BinDex [25], the most recent and best performing solution, combines both sort-order and recordorder structures to speed up scans.The former is an array of rowIDs (a.k.a.position array) in the sort order of values.The latter is a set of record-order bit vectors, each representing the records in a pre-defined value range.For a scan, the predicate specifies a target value range.BinDex finds the closest pre-defined range to the target range, and copies the associated bit vector as the draft result.Then, it searches the position array for records in the intersection of the target and the pre-defined range, and corrects the corresponding bits in the draft result to obtain the scan result.However, the improvement of scan performance comes at a price: scan indices introduce significant space cost.Both record-order and sort-order structures store an element per record.The element size can be comparable or even larger than the value size in the base data.In a B+-Tree [11], there is a (value, rowID) entry per record.Suppose the rowID takes 4B.Then the B+-Tree incurs at least 5x, 3x, 2x, and 1.5x space overhead for 8-bit, 16-bit, 32-bit, and 64-bit values, respectively.In a Bitmap Index [29], every bit vector stores 1 bit per record.The space cost increases linearly as the number of distinct values.For example, the bit vectors for 64 distinct values lead to 1-8x space overhead for 64-bit -8-bit values.Column Sketches [16] store a (e.g., 8-bit) code per record.Therefore, the space overhead is 0.125-1x, which is low among scan indices.In a BinDex [25], the position array contains a (e.g., 4B) rowID per record.Each bit vector costs 1 bit per record.The number of stored bit vectors is determined by the number of pre-defined value ranges.For example, if there are 64 pre-defined ranges, then the BinDex sees 1.5-12x space overhead for 64-bit -8-bit values.As the number of pre-defined ranges increases, BinDex performs better because the intersection between a target range and its closet pre-defined range tends to become smaller.
Cabin: a Compressed Adaptive Binned Scan Index 57:3 Therefore, for achieving higher scan performance, it can be necessary to store more bit vectors, incurring even higher space overhead.
In this paper, we propose a novel scan index, Cabin (compressed adaptive binned scan index).Cabin exploits the following three techniques for better time-space tradeoff: • Filter Sketches: BinDex stores a bit vector for each value range, which consumes a lot of space.
To reduce space, we replace every 2  − 2 bit vectors with a -bit sketched vector (e.g.,  ≤ 9).A -bit code can represent 2  − 2 value intervals (and two reserved virtual intervals for all values less than or greater than the set of intervals).We call the resulting record-order structure filter sketches.Cabin divides all the value intervals into groups of 2  − 2 intervals each, and build -bit filter sketches per interval group.In this way, Cabin reduces the space cost of bit vectors by a factor of 2  −2  .In addition, we describe a novel encoding and computation scheme, MLO, that minimizes the number of SIMD operations for generating a draft result from filter sketches given a pre-defined value range.
• Selective Position Array: The position array in BinDex incurs a fixed space overhead.If space budget is tight, we propose to selectively store the position array, i.e., storing rowIDs for a fraction of intervals.If the predicate of a scan hits a value interval whose rowIDs are not stored, Cabin relies on the filter sketches for evaluating the predicate.Note that our design of filter sketches covers the entire value range (−∞, +∞) for every interval group (with the two virtual intervals).Consequently, we can use filter sketches in a similar fashion to Column Sketches to support scans.• Data-aware intervals: Bindex divides the full value range into even-sized intervals for obtaining the pre-defined value ranges.This works well when the number of duplicates of each distinct value is small.However, real-world data may contain a lot of duplicates because of skewed data distribution (e.g., power-law graphs) or low unique value counts (e.g., categorical attributes).In such cases, Cabin judiciously makes data-aware selection of intervals to optimize for popular values.
Contributions.The main contributions of this paper are threefold.First, we propose a scan index, Cabin 1 , with three novel techniques, i.e., filter sketches, selective position array, and dataaware intervals, to improve the time-space tradeoff for scans.Second, we model the time and space cost of Cabin, and design an algorithm to compute the optimal design parameters for Cabin.Finally, we perform extensive experimental evaluation of Cabin under various data workloads.Our experimental results show that compared with state-of-the-art scan solutions, Cabin achieves better time-space tradeoff, and attains 1.70 -4.48x improvement for average scan performance given the same space budget.In addition, we evaluate Cabin in a full fledged main memory analytical database system, MonetDB, using a subset of queries in TPC-H and SSB benchmarks.Our results show that Cabin can improve the query performance of MonetDB by a factor of 1.1-49.9x.
Organization.The rest of the paper is organized as follows.Section 2 provides the background on existing scan solutions.Section 3 presents the Cabin design.Then, Section 4 analyzes the space and time cost of Cabin for optimal design selection.After that, Section 5 performs the experimental evaluation.Section 6 discusses a number of practical issues.Finally, Section 7 concludes the paper.

Data Scan
Problem Definition.In this paper, we focus on single-column data scan operations.We consider the same problem definition as in previous studies [13,16,25,26].
• Base data: A column of fixed sized values, which is the common storage layout in main-memory analytical databases [6,8,12,22,32,38].Note that variable sized fields are often encoded into fixed sized codes with dictionary encoding [7,12].• Query predicate: A filter predicate (e.g., <, >, ≤, ≥, =, ≠, or BETWEEN) that specifies a value range of the input data.• Output: A result bit vector, whose -th bit is set to 1 if the -th record satisfies the predicate.
Compared to other return formats (e.g., rowID list), bit vectors can be efficiently merged with bitwise operations to compute complex predicates [10].Two Ways to Optimize Scans.A plain scan reads the base data sequentially and checks if each value satisfies the filter predicate.There are two main ways to improve plain scans in the literature.
The first approach is to optimize the layout of the base data.The design goals are to exploit SIMD and/or to reduce the amount of data accessed by the scan.BitWeaving [26] proposes bit-level layouts.In comparison, ByteSlice [13] proposes a byte-level columnar layout, where the -th byte of all values are stored contiguously.Predicate evaluation employs SIMD to compare multiple values from the most to the least significant byte.It stops early if the higher-order bytes already determine the predicate outcome.This early pruning technique can significantly reduce the amount of data to read.
The second approach does not change the base data.Instead, it builds auxiliary data structures to improve scan performance.Examples include B+-Tree [11], Bitmap Index [29], Zone Maps [27], Column Sketches [16], and BinDex [25].We call such auxiliary data structure scan index.Compared to the first approach, scan indices can perform better, but often incur significant space overhead.We aim to design a scan index with better time-space tradeoff.

Scan Index
Let us consider the principles of designing a scan index.From the problem definition, we see that there are two orders: the record order and the sort order.Both the base data and the output bit vector are in the record order, while predicate evaluation can be more efficient if the values are sorted.As a result, it is natural to consider three types of structures in scan indices: • Record-order structure: It contains one item for every value in the base data.The order of the items is in the record order of the base data.An example is the bit vector in Bitmap Index.• Sort-order structure: It contains one item for every value in the base data.The order of the items is in the sort order of the values.An example is the B+-Tree leaf nodes.• Summary structure: The structure is small.It contains aggregates of the base data, or metadata of the other structures.We examine the structures of each scan index, as shown in Table 1. Figure 1 depicts a high-level picture of the time-space trade-off of scan solutions.(Please see the space-time analysis under different workloads in our experiments in Section 5.) B+-Trees [11].In a B+-Tree, the leaf level stores the list of (value, rowID)'s in the sort order.This consumes more space than the base data.A scan performs a search to reach the leaf level, then follows the linked list of leaf nodes to retrieve all rowIDs that satisfy the predicate and sets the relevant bits in the result bit vector.This is efficient when the predicate selectivity is low, as shown in Figure 1.However, it can be costly with medium to high selectivities (not shown) because of pointer chasing and random memory accesses.Bitmap Index [29].Bitmap index stores a bit vector for each distinct value, and a summary structure that maps keys to vectors.For an equality predicate, it returns the relevant bit vector, which is very fast.However, there are two main problems.First, for common data columns with many distinct values, Bitmap index can be much larger than other scan indices.Second, predicates that specify value ranges often require merging multiple bit vectors, leading to degraded performance.Hence, we do not further discuss or evaluate Bitmap index in the paper.Note that the above problems can be addressed by more sophisticated use of bitmaps, such as range-based bit vectors in BinDex and filter sketches in our solution.
Zone Maps [27].It divides the input data into buckets, and maintains per-bucket aggregates (e.g., min, max).The summary structure incurs little space overhead, as shown in Figure 1.For a scan, Zone Maps skip a bucket if the bucket aggregates show that the entire bucket satisfies or dissatisfies the predicate.However, bucket skipping is effective only when values are mostly sorted or the selectivity is very low or very high.In other cases, Zone Maps essentially perform a plain scan, making it the slowest scan index.Column Sketches [16].As shown in Figure 2(a), Column Sketches consist of a sketched column and a compression map.The latter maps value ranges to codes.The former contains a code per value in the record order.In the example,  0 =5 ∈ (−∞, 8], thus its code is 000. 1 =23 ∈ (22,28], so its code is 100.The space cost is  • 8 bytes, where  is the number of records and  is the code width.Since  (e.g., 8 bits) is often smaller than value size (e.g., 64 bits), Column Sketches consume smaller space than the base data.
A scan first maps the predicate value to the (predicate) code.Then, it compares each code in the sketched column with the predicate code.In most cases, code ≠ predicate code, then the comparison suffices.When code = predicate code, the scan has to check the value in the base data.Therefore, the scan reads the sketched column and 1 2  (e.g., 1 256 for =8) of the base data.Consequently, Column Sketches significantly reduce the amount of data to read.BinDex [25].BinDex combines features of B+-Tree and Bitmap Index, as shown in Figure 2(b).First, the position array and the virtual sorted values are similar to the sorted (value, rowID)'s in the B+-Tree's leaf level.To reduce space cost, BinDex stores only the rowIDs.The value part is virtual

Selective Position Array
For a scan (e.g.,  ≤ 10), BinDex uses the area map to locate the corresponding interval (e.g.,  2 = [9,12)).Then, it performs a binary search in the interval, which follows the rowIDs to read the values in the base data.For example, the search finds the predicate value 10 at the second entry in  2 .After that, BinDex chooses and copies a filter bit vector (e.g.,  2 that represents (−∞, 12)) as the draft result.Finally, it refines the draft result by flipping the wrong bits (e.g., bit 14 that corresponds to value 11).
Suppose there are  records and  intervals.Then, there are  − 1 filter bit vectors.Suppose each rowID takes  bits.Thus, the space cost of BinDex is  ( + −1) 8 bytes.The scan time consists of checking the area map ( (log )) , searching the interval ( (log   )), copying the filter bit vector as draft result (i.e., sequentially copying  8 bytes), and refining the draft result (i.e., randomly accessing average  4 bits).Figure 1 shows the BinDex curve varying the number of intervals .As  increases, the space cost increases linearly, while the number of random bit accesses decreases, leading to better performance.Motivation of Our Solution: Cabin.While BinDex achieves the best performance among existing scan indices, it pays substantial space overhead, as shown in Figure 1.We aim to achieve better time-space tradeoff based on three observations: • Observation 1: It is costly to keep a filter bit vector per interval.We propose filter sketches, which represent multiple intervals using sketch codes to reduce space.
Cabin: a Compressed Adaptive Binned Scan Index 57:7 • Observation 2: The position array takes a fixed amount of space.If space budget is tight, we propose selective position array, i.e. storing position arrays for a fraction of intervals.If the predicate falls into an interval without position array, the scan can use filter sketches in a similar fashion to Column Sketches.• Observation 3: When the number of duplicate values is high, the interval boundaries should be judiciously chosen for time and space efficiency.Hence, we study data-aware intervals.

CABIN DESIGN
We propose Cabin, a compressed adaptive binned scan index.In the following, Section 3.1 overviews the design.Then, Section 3.2-3.4describe the three distinctive features of Cabin, i.e., filter sketches, selective position array, and data-aware intervals, respectively.) intervals.Then we divide all the intervals into groups of  intervals each, and build  filter sketches for every group of  intervals.The -th interval in group  is denoted as  , . ,0 and  ,−1 are two special virtual intervals, representing values less than and greater than the entire group   .In this way,  ,0 ,  ,1 , ...,  ,−1 cover the full value range.(This is important for supporting selective position array.)Note that while Figure 2(c) draws only one group of intervals due to paper space constraint, there are usually multiple interval groups in a Cabin.• Selective position array: According to Observation 2, we can remove the portion of the position array for a subset of intervals to save space.As shown in Figure 2(c), the shaded portion of the selective position array is not physically stored.However, an interval without physical position array may be visited by a scan.In such cases, we cannot perform binary search of the predicate value in the position array.Instead, Cabin relies on the filter sketches for evaluating the predicate.Since the filter sketches of an interval group cover the full range of values, they can be used in a similar fashion to Column Sketches.• (Data-aware) interval table: As shown in Figure 2(c), the interval table keeps the metadata of each interval in each group, including the value range, the sketch code word, the number of records in the interval, its start location in the selective position array, and a flag used by selective position array and data-aware intervals to indicate the interval types.To build the interval table, it is natural to divide the value range into even-sized intervals.However, according to Observation 3, even-sized intervals may be sub-optimal when there are a large number of duplicate values.In such cases, Cabin judiciously makes data-aware selection of intervals to further optimize the scan.In the following subsections, we describe three Cabin designs addressing the three observations progressively:

Cabin Overview
• Cabin  : A design with filter sketches, even-sized intervals, and fully physical position array (cf.Section 3.2) • Cabin  : Cabin  with selective position array (cf.Section 3.3) • Cabin  : Cabin  with data-aware intervals (cf.Section 3.4)
records and starts at  [4], Cabin performs a binary search between  [4] and  [7] of the position array.The split point is  [5], which refers to  14 =11.(2) There are two options to generate the draft result: a) compute a bit vector for all values <  0,2 (i.e.,  < 10, or union of  0,0 and  0,1 ) then flip the bits to the left of the split point in  0,2 (i.e.,  [4]); or b) compute the bit vector representing values ≤  0,2 (i.e.,  < 16, or union of  0,0 ,  0,1 , and  0,2 ) then flip the bits including and to the right of the split point in  0,2 (i.e.,  [5],  [6], and  [7]).Cabin chooses a) since it minimizes the number of bit flips.(3) Cabin combines the set of filter sketches in group 0 to compute the draft result for (−∞, 10).( 4) Cabin flips the bit for  [4] (i.e., bit 17) in the draft result to obtain the final result.
Compared to BinDex, Step (1), (2), and ( 4) are similar.The main novel feature of Cabin F resides in Step (3).In the following, we design filter sketches to optimize the draft result computation.Then, we describe the scan algorithm for Cabin F .Vertical Bit Layout for Filter Sketches.Let us first consider a horizontal layout for filter sketches.In our design, the optimal sketch code width  is often less than 8.As a result, sketch codes may span byte boundaries, incurring computation costs to reconstruct the codes.Therefore, we store filter sketches in a vertical bit layout, as shown in Figure 2(c).  0, ( = 0, ...,  − 1) stores bit  of all the codes.This layout enables the use of SIMD logical operations to efficiently generate draft results.Draft Bit Vector Computation Problem.We focus on one of the interval groups.Since the computation is the same regardless of group , we omit the group ID  below for simplicity.Following the example, we first consider a predicate " ≤ ".
From Step (2), we see that the draft result is the union of intervals  0 . . .  for some .Let the code word of interval   be  ( ).Then the set of code words associated with  0 . . .  is   = { (0), . . .,  ( )}.For every code  in the filter sketches, we set the corresponding bit in the draft result to 1 iff  ∈   .
Then, two related questions arise: 1) What is the encoding  ( )? 2) How to compute  ∈   efficiently?Our Solution: MLO.BitWeaving [26] proposed the VBP vertical bit layout.As shown in Figure 3(a),  ( ) =  in VBP.To compute  ∈   , VBP checks each bit from the most to the least significant in a loop.Each loop iteration updates   and   , which represent the less than and the equal to cases.If we omit ¬ (since combined operations, such as andnot, may exist as a single SIMD instruction), VBP performs 5 + 1 SIMD operations for -bit codes.
We propose MLO (minimal logical operations), a novel encoding and computation scheme that minimizes the number operations.Unlike VBP, MLO avoids the per-bit loop.Instead, we derive a boolean logical formula to compute  ∈   .
In MLO, we set . . 0 be a code word.Our target is the function   (,   ) s.t.  (,   )=1 iff  ∈   .The subscript  denotes the number of bits in the code word.We derive the function's formula by recursion.Suppose   (,   ) is 1 iff  ∈   when we focus on the lower  bits of the codes.We have the following: We can prove the following based on the recurrence relations.Scan Algorithms for Other Comparison Operators.The scan algorithms for the other comparison operators are similar to Algorithm 1.We discuss them in the following: • " < ": " < " can be transformed to " ≤  ′ " where  ′ is the next value 2 less than .As there are no other values between  and  ′ , the two predicates have the same result.• " ≥ " and " > ": To compute " ≥ ", the basic idea is to compute " < " and negate the result bit vector.After calling ℎ__, we perform one additional SIMD operation     = ¬    .In this way, we obtain the negation of the result bit vector after the bit flips.Similarly, " > " can be computed by negating the result of " ≤ ".In these cases, MLO performs up to  SIMD operations for every   code words.
• " 1 ≤  ≤  2 " : In Cabin, this case is transformed into " ≥  1 " and " ≤  2 ".Then, Cabin computes the bitwise AND of the result bit vectors of " ≥  1 " and " ≤  2 ". • " = " and " ≠ ": " = " is transformed into " ≤  ≤ "." ≠ " is supported by negating the result of " = ".Shortcut Optimization.BinDex takes a different approach for evaluating " = ".The idea is to start with an all-zero draft vector, then search the position array to find the small number of records that satisfy the predicate, and finally set the corresponding bits in the draft vector.We call this approach shortcut optimization.The shortcut optimization reduces the cost of computing the draft bit vector from filter sketches.We find this beneficial for comparison operators (e.g., BETWEEN) other than "=" when the selectivity is low.Therefore, we extend the use of shortcut optimization for all comparison operators.After searching the selective position array, Cabin knows the selectivity of the scan.If it is lower than a given threshold (e.g., 0.5%), Cabin performs the shortcut optimization.

Selective Position Array
Let us consider the space cost of the position array.For 4-byte rowIDs, it takes 4 bytes, where  is the number of records.Since the rowID size can be similar to or even larger than the data value size, this incurs significant space overhead.For example, the rowIDs of 1 billion records take nearly 4GB memory space.
To save memory space, we propose to selectively store portions of the position array.That is, for a subset of intervals, we do not store their rowIDs, and set their selective flags to 0 in the interval table.This allows more flexible time-space tradeoff.Scan Algorithm Without Selective Position Array.The scan algorithm of Cabin FS is listed in Algorithm 2. Compared to Cabin F , the main difference is that Cabin FS cannot search the target interval using the selective position array in order to refine the draft bit vector.Instead, we use the filter sketches in a similar fashion to Column Sketches to refine the draft result: 1. Prepare (Line 2-3).2. Find Matching Interval (Line 4).Since the portion of position array for  , is not stored, we cannot search  , .Cabin FS simply sets   to all intervals <  , .3. Generate Draft Bit Vector (Line 5-6).This step is the same as Cabin F .ℎ__ is invoked to obtain the draft bit vector for every   code words.4. Refine Bit Vector (Line 7-10).For a value ∈  , , filter sketches alone cannot tell the predicate outcome.In such cases, Cabin FS visits the base data to evaluate the predicate.Accordingly, the algorithm obtains all the positions whose sketch code is equal to the code of  , (Line 7-8), then checks the corresponding base value against the predicate (Line 9-10).Sketch Code Equality Comparison.We develop a boolean logical function   (,   ), where   is the code of the target interval  , .Let  =  −1  −2 . . . 0 be a code word.  (,   ) =1 iff  =   .We also derive the formula by recursion: −1 = 1.For example,  3 (, 010) = ¬ 2 ∧  1 ∧ ¬ 0 .Note that Intel processors support andnot as a single SIMD instruction.Therefore,   (,   ) takes up to  SIMD operations to compute for each SIMD segment.

Data-aware Intervals
In Cabin F and Cabin FS , we divide the value range into equal-sized intervals.This strategy works well when the number of duplicates for each distinct value is small.However, real-world data may contain a lot of duplicates.Two representative scenarios are as follows: • Skewed data distribution: Real-world data distribution is often skewed.For example, power-law distribution is common in real-world graphs, such as social networks [5].In this scenario, a few values may have a large number of duplicates, while the number of duplicates of most values is small.• Small number of distinct values (NDV): The number of distinct values can be small in a data column.Examples are categorical attributes.In this scenario, the number of duplicates can be quite large for each distinct value.

Filter Sketches
Interval Table Column Data We call values with a large number of duplicates popular values.Cabin FSD judiciously chooses intervals, interval groups, and/or sketch code widths to better support popular values.
In the following, we describe the steps of index building with an emphasis on data-aware intervals: Index Building Step 1: Popular Value Discovery.Given a base data column, we consider three alternative methods for discovering popular values.First, the sort-based method allocates a temporary sort buffer and sorts the (value, rowID) pairs in value order.It scans the sorted array and computes the number of duplicates for each value.Second, the hash-based method allocates a temporary hash table and counts the number of duplicates per value using the hash table.Third, the sample-based method collects a small (e.g., 1%) sample from the base data.Then, it applies the sort-based method on the sample.Since popular values appear in the sample with high probability, we can use the sample based estimates to discover popular values.We record a value as a candidate popular value if its number of occurrences ≥   .The bound   is computed as the record count  divided by the maximal number of intervals given the memory space budget of Cabin.We also collect/estimate other meta-information about the base data, such as the value range and the number of distinct values.Index Building Step 2: Optimal Design Selection.Given memory space budget and the data characteristics, we compute the optimal design parameters (cf.Section 4).Then, we compute   , the average interval size, as  divided by the number of intervals in the optimal parameters.Since   >   in most cases, we remove unqualified candidates to obtain the popular values.For each popular value, we put the value into its own interval.We call such interval skew interval.If the number of duplicates is larger than the average interval group size   (2  − 2), then we put the value into its own interval group.We call such interval group skew interval group.The code width of all skew interval groups is set to 1.As for unpopular normal values, we create even-sized intervals.Index Building Step 3: Structure Building.Given the selected design parameters in Step 2, we build the interval table, the selective position array, and the filter sketches.For skew intervals and skew interval groups, we do not store the associated portions of the position array.As a skew interval contains only a single value, scans do not need to search the position array in the interval.Optimizing for Popular Values.Figure 4 illustrates data-aware intervals for popular values.There are two popular values in the base data: 50 and 2. First, 50 occupies its own skew interval group (i.e., Group 1).A skew interval group contains only two virtual intervals (−∞,   ] and (  , +∞), where   is the popular value.We set flag=3 to indicate a skew interval group.Second, 2 has its own skew interval in Group 0 (i.e.,  0,1 ).We set flag=2 for skew intervals.For a scan with predicate  ≤ , Cabin FSD finds the target interval in the interval table.There are two cases for popular values.First, the target is a skew interval group with flag=3 (e.g., =50).Then the scan directly returns the single filter sketch (e.g.,   1,0 ).Second, the target interval is a skew interval in a normal group (e.g., =2).Then the scan performs a simplified Cabin F Scan. Specifically, since the skew interval contains a single value, there is no need to refine the draft result.The draft result bit vector can be returned directly.In both cases, Cabin FSD simplifies the scan procedure, thereby improving scan performance for popular values.

OPTIMAL DESIGN SELECTION
We model the time and space cost of Cabin, then describe how to compute the optimal design parameters for Cabin.Design Parameters of Cabin.There are three important design parameters in Cabin.(Table 2 lists the terms used in the analysis.) • Sketch code word width ().As  increases, the number of bit vectors in filter sketches increases.
The cost of draft bit vector generation becomes proportionally expensive.On the other hand, the number of intervals per group (2  − 2) increases exponentially.Interval size becomes smaller, leading to fewer bit flips in the refine step.• Number of groups ().Every contiguous 2  − 2 intervals form a group.The total number of non-virtual intervals is  = (2  − 2).As  rises,  increases.The interval size decreases, and therefore the refine step improves.However, the space for storing all filter sketches grows as  increases.The optimal  is often larger than 1 (cf.Section 5.4).• Stored proportion of selective position array (). ∈ [0, 1] is the fraction of rowIDs physically stored in the selective position array.For even-sized intervals,  •  computes the number of intervals, whose flag=1 in the interval table.In this work, we assume that every data record is equally likely to be the query predicate value.Hence, the probability that the predicate value falls into each equal-sized interval is the same.Therefore, we randomly pick intervals that do not store the position array.
Scan Performance.Let  , be the probability that the scan predicate value falls into interval  , .Let | , | be the number of records in  , .If we assume that every base data record is equally likely to be queried, then  , = | , |  .Let  0 be the set of all intervals whose flag=0 (i.e., the rowIDs are not stored).Let  1 be the set of all intervals whose flag=1 (i.e. with stored position array).Let   be the set of all skew intervals.Then the average scan time of Cabin can be modeled as follows: For popular values, Cabin does not store the rowIDs.Therefore, we subtract the number of records with popular values from  in the computation of   .As for filter sketches, there are two cases.For a skew interval group, Cabin stores only a single  -bit vector.For a skew interval in a normal group, the size of filter sketches is the same that of normal intervals.Optimal Design Selection.Given a memory space budget, Algorithm 3 finds the design parameters that optimize the scan time of Cabin.It begins by getting the candidate popular values (Line 2, cf.Section 3.4).Then, it enumerates every pair of (, ).We restrict  to be up to 9, which is sufficient in our experiments with up to a billion records.The range of  is bounded by the cases where the selective position array is entirely stored or removed.Therefore, the procedure is efficient because the search space is limited.For each pair of (, ), the algorithm computes the set of skew intervals from popular values (Line 8, cf.Section 3.4).Then, for unpopular values, it computes equal-sized intervals (Line 9).After that, the algorithm examines the space cost of filter sketches and selective position array by taking into account of both skew intervals and normal intervals.It computes  (Line 10) and chooses a subset of intervals to remove the associated portion of the selective position array (Line 11).This completes the tentative design for all intervals.The scan time is estimated using the above formulas3 (Line 12).Finally, the algorithm returns the design parameters that obtain the minimal scan time.

EVALUATION
We perform extensive experiments to compare Cabin with existing optimized scan solutions in this section.

Experimental Setup
Machine Configuration.The experimental machine is equipped with a 3.0GHz Intel Core i7-9700 processor (8 cores, 12MB L3 cache), 32GB DDR4 DRAM, and 16TB disks.The processor supports 256-bit SIMD instructions.The operating system is 64-bit Ubuntu Server 20.04 with Linux kernel 5.13.0-30.All testing programs are compiled using g++ 9.4 (i.e., the default version in Ubuntu 20.04) with optimization flag -O3 and SIMD flag -mavx, -mavx2, -mbmi1, -mbmi2.All experiments are performed in main memory.Solutions to Compare.We compare Cabin against five optimized scan solutions: ByteSlice (BS) [13], Zone Maps (ZM) [27], Column Sketches (CS) [16], B+-tree [11], and BinDex [25].We obtain the code of ByteSlice and BinDex from their github repositories.We select a widely used in-memory B+-tree implementation, STX B+-Tree [3].For Zone Maps and Column Sketches, we follow the papers to faithfully implement the techniques.We use single-byte column sketches as in the original experimental setting.Moreover, we try our best to optimize all scan solutions with software prefetching and SIMD techniques.
Both BinDex and Cabin can leverage more space to construct more filter bit vectors or filter sketches for achieving better scan performance.We set a reasonable space limit by considering the space cost of the B+-Tree.In our experimental setting, a rowID is 32-bit large and each column data value takes  bits, where =8, 16, 32, or 64.Hence, the data column takes  bits, where  is the number of records.Since the B+-Tree's leaf level contains  pair of (value, rowID), the B+-Tree size is at least ( + 32) bits.(Note that the B+-Tree consumes more space for storing non-leaf nodes as well as other fields, such as sibling pointers, in leaf nodes.)Consequently, we set the default space upper limit of BinDex and Cabin to be +32  × column size (i.e., 5x for =8, 3x for =16, 2x for =32, and 1.5x for =64), which is smaller than the B+-Tree.
Note that this space limit is a lower bound of the space used in tree-structured indices in general, including not only B+-Trees, but also more recent proposals, such as ART [24] and learned index [20].This is because values are not sorted and a tree-structured index has to store (value, rowID)s for the data column.Synthetic Workload.We generate each synthetic data column with one billion random values.We consider four cases: 1) integer data following uniform distribution over the full value range; 2) integer data following Zipf distribution to model skew data; 3) integer data with   distinct values uniformly distributed in [0,   ) to model domain encoded categorical attributes; and 4) skewed floating point data following skewed normal distribution (location=0, scale=1, shape=4) or pareto distribution (scale=1, shape=4) generated with the Boost library.Note that there are few duplicate values in case 1 and 4, but a large number of duplicates in case 2 and 3.
We perform scan operations with different types of predicates on the synthetically generated data columns.We run all experiments in a single-threaded test program.The output of all scan solutions is a result bit vector except the B+-tree, which outputs a rowID list.Real-World Data Sets.We also conduct experiments on two widely used real-world data sets, DBLP [1] and IMDb [2].The DBLP data set contains 5.26M records.We use the n_citation column in the scan experiment.The IMDb data set contains 10.2M records.We use the startYear column in the scan experiment.Both data sets contain a lot of duplicates.TPC-H and SSB Workloads.Besides the above stand-alone tests, we evaluate Cabin in a full fledged main memory database system, MonetDB.To add Cabin to MonetDB, our implementation follows the approach of BinDex.Specifically, the Cabin-based select operator supports the same API as the original select operator, which takes a single column and a predicate as input.A new fetch operator supports the result bit vector instead of the rowID list in the original fetch operator.To run a SQL query, we first invoke MonetDB to generate the corresponding MAL script, which consists of operator calls to execute the selected query plan for the query.Then, we modify the MAL script to call either BinDex or Cabin enhanced scan operations.We run the original, BinDex-enhanced, and Cabin-enhanced MAL scripts to compare the query performance of MonetDB, MonetDB+BinDex, and MonetDB+Cabin, respectively.In the MonetDB experiments, we still bound the memory usage of BinDex and Cabin by +32  × column size for each indexed column, where  is the bit width of data values.
We run TPC-H and SSB (Star Schema Benchmark) workloads on MonetDB.For TPC-H, we generate data with scale factor=20.We follow previous study [16,25] to run TPC-H Q1 and Q6, which scan the Lineitem table with various filters to compute aggregates.We also run Q12, which performs a single join between Lineitem and Orders tables in addition to scans and aggregates.For SSB, we generate data with scale factor=60.We run all the SSB queries.

Scan Performance on Uniform Data
In this subsection, we perform scan experiments on uniform data with few duplicates.We vary the predicate comparison operator, the data width, and the predicate selectivity in the experiments.Scan with ≤ Operator.Figure 5(a) compares the scan solutions with ≤ operator on 32-bit integer data.The x-axis varies the selectivity from 0 to 100% with 1% increments.The y-axis reports scan time in ms.Each reported point is the average over 10 runs.
First, we see that Zone Maps, Column Sketches, and ByteSlice show three flat lines as the selectivity increases.Zone Maps fall back to plain scans in most cases, reading 32 bits of data and displaying a flat (green) curve.Only at very low or very high selectivities can Zone Maps effectively skip entire buckets.Column Sketches out-performs Zone Maps by reducing the amount of data to read.Regardless of selectivities, it reads 8 bits in the sketched column to compare the predicate code with the 1-byte sketch codes, and performs  256 random accesses where sketch code = predicate code.ByteSlice reads the byte-level columnar data with an efficient SIMD algorithm.It accesses similar amount of data for all selectivities.
Second, both BinDex and Cabin see wave-like curves.As the selectivity increases, the predicate value moves across the intervals in the scan indices.The number of bits to refine depends on how far the predicate value is from the closest interval boundary, and thus varies periodically as the predicate value moves across multiple interval boundaries.This causes the wave shapes.
Third, the space of BinDex and Cabin are bounded by twice the column data size (i.e., 32 ) as discussed in Section 5.1.Hence, the space limit is 64 .For BinDex, the position array takes 32 for storing  32-bit rowIDs.Therefore, BinDex uses the remaining space to store 32  -bit filter vectors.There are 33 value intervals.In comparison, the optimal design of Cabin has  = 6 interval groups, each containing 2  − 2 = 30 intervals where  = 5.That is, Cabin supports 180 intervals in total, over 5x as many as that supported by BinDex.As a result, Cabin reduces the interval size and the average number of bits to refine (i.e., 1  4 × interval size) by over 5x, attaining significant better performance than BinDex.
Overall, Cabin achieves up to 11.7x, 8.38x, 5.86x, and 3.51x better performance than Zone Maps, Column Sketches, ByteSlice, and BinDex for 1%-99% selectivities, respectively.(The improvement over Column Sketches and ByteSlice at 0% and 100% is over 20x because of the shortcut optimization.)We compute the average scan time over all the measured selectivities.Compared to Zone Maps, Column Sketches, ByteSlice, and BinDex, Cabin improves the average scan time by 4.48x, 3.27x, 2.27x, and 1.70x, respectively.Scan with Different Comparison Operators.Figure 5(b) shows the scan performance of different filter operators.Note that ≤ performs a single in-equality comparison, representing <, >, and ≥ operators.For BETWEEN, Cabin decomposes  1 ≤  ≤  2 into  ≥  1 and  ≤  2 .Hence, BETWEEN takes nearly twice as much time as ≤ in Cabin.Compared with Zone Maps, Column Sketches, ByteSlice, and BinDex, Cabin improves the average performance of scans with BETWEEN by 2.82x, 2.81x, 2.42x, and 1.70x, respectively.For =, Cabin and BinDex use the shortcut optimization that directly sets the matching bits in the result bit vector 4 .As a result, Cabin and BinDex take a few ms to evaluate scans with =, which are orders of magnitude faster than the other solutions.Index Space-Time Analysis Varying Data Width.Figure 6 analyzes the time-space trade-off of the scan solutions for queries with ≤ operator while varying data width.The x-axis is the index size normalized to that of the column data.The space bound is set as discussed in Section 5.1.The y-axis reports the scan time averaged across all the measured selectivities.In the figures, Zone Maps, Column Sketches, and ByteSlice are single points.Cabin and BinDex can exploit larger space to achieve better scan performance, showing two curves.Note that the ByteSlice implementation does not support 64-bit data and thus is not tested on 64-bit data.
From the figures, we see that Cabin achieves the best performance (at 2x-5x sizes for 8 bits, at 2x-3x sizes for 16 bits, at 1x-2x sizes for 32 bits, at 0.25x-1.5xcolumn size for 64 bits).Compared to BinDex, Cabin's higher performance mainly comes from filter sketches, which support more intervals given the same space budget.Moreover, Cabin enjoys a much wider range of space 4 BinDex implements the shortcut optimization only for =.Cabin extends it to optimize all comparison operators, including BETWEEN, when the selectivity is low.configurations than BinDex.BinDex is at least as large as its position array, which takes 4x-0.5x as much space as the data column for 8-bit -64-bit values.In contrast, the selective position array can effectively reduce Cabin's size with graceful performance degradation.
Comparison with B+-Tree When Selectivities are Low.We find that the B+-Tree performs poorly at medium to high selectivities.To show good B+-Tree performance, we run scans with BETWEEN and focus on low selectivities from 0% to 0.4%.Given a specific selectivity, we randomly pick 1000 value pairs as the BETWEEN predicate values that satisfy the target selectivity, and report the average time of the 1000 scans in Figure 7(a).From Figure 7(a), we see that the B+-Tree is the second best solution when the selectivity is lower than 0.2%.However, as the selectivity increases, the performance of the B+-Tree quickly degrades because the B+-Tree leaf scan incurs random memory accesses and costly pointer chasing.At 0.4% selectivity, the B+-Tree becomes the slowest solution.In comparison, Cabin achieves the best performance by using the shortcut optimization for low selectivities 4 .
Figure 7(b) shows the space-time scatter plot for low selectivities from 0% to 0.4%.We see that Cabin achieves the best time-space trade-off among all solutions.Note that compared to the B+-Tree, Cabin is both smaller and faster.When the index size is reduced, Cabin employs the selective position array.If the BETWEEN range falls in intervals where position arrays are not stored, the shortcut optimization cannot be applied.Hence, the scan performance degrades gracefully, leading to the arc shape for smaller index sizes.

Data with Different Characteristics
Scan Performance on Data with Duplicates.We study two representative scenarios where there can be a lot of duplicates in the column data: 1) Skew data with Zipf distribution, which models the power-law distribution in real-world data sets; and 2) Data with a small number of distinct values, which models categorical attributes that are domain encoded.
Figure 8(a) shows the space-time scatter plot for skew data with the Zipf parameter = 2.  both reduces the space cost and improves the scan performance for supporting the popular values.As a result, Cabin achieves the best time-space trade-off in both cases, as shown in Figure 8. Scan Performance on Skewed Floating Point Data.In both data sets, value ranges contain skewed amount of data but the number of duplicates is low.Therefore, equal-sized intervals, which contain equal number of records per interval, still work well.Cabin sees similar benefits as in the case of 32-bit uniform integer data in Figure 6(c).Scan Performance on Real-World Datasets.Figure 10 shows the scan performance on DBLP's n_citation column and IMDb's startYear column.Since both data sets contain a lot of duplicates, Cabin constructs data-aware intervals to achieve good performance.From Figure 10, we see that Cabin achieves the best scan time at all sizes for DBLP and at 0.5-2x column size for IMDb.

Benefit of Proposed Techniques in Cabin
We study the benefit of the proposed techniques of Cabin.All experiments in this subsection use 32-bit data and the ≤ operator.Filter Sketches.Figure 11(a) shows the scan time while varying the sketch code width  from 2 to 9, and limiting Cabin's size by 2 × column size.The scan time is broken down into three parts of the Cabin F algorithm: 1) prepare and search (search), 2) generate the draft bit vector (draft), and 3) refine the bit vector (refine).Note that the search cost is tiny and hardly visible.As  increases, the number of intervals increases.Hence, the interval size and the number of bits to refine decrease accordingly, leading to the decreasing refine cost.On the other hand, a draft result bit vector is constructed from  filter sketches with MLO computation.As  increases, the draft cost increases linearly as guaranteed by theorem 3.1.Our DesignSelection algorithm computes the optimal design parameter.In this case,  = 5 obtains the best scan time, which is consistent with the output of the DesignSelection algorithm.Scan Time Breakdown Varying  and .Figure 12 depicts the scan performance of Cabin with  = 5 and varying  and  to satisfy the space limit.The line shows the average scan time,  Data-aware Intervals.We compare Cabin FS and Cabin FSD to understand the benefit of data-aware intervals.Similar to Section 5.3, we consider two cases: 1) skew data with Zipf distribution, and 2) data with a small number of distinct values.Cabin FSD employs data-aware intervals to optimize for popular values.For case 1), Figure 14(a) shows the scan time while varying the Zipf parameter.We see that as the data is more skewed, Cabin FSD achieves more significant improvement over Cabin FS .For case 2), Figure 14(b) reports the scan performance while varying the number of distinct values.We see that Cabin FSD works better than Cabin FS when there are fewer than 1000 distinct values.In such cases, entire intervals contain the same values, and the optimization of data-aware intervals is applicable.Cabin FSD improves the scan time and reduces the space cost at the same time.

Build Time
Popular Value Discovery and Build Time Breakdown.Figure 15 evaluates sort-based (sort), hash-based (hash), and sample-based (sample) methods for discovering popular values during Cabin index building.The build time is decomposed into five components: popular value discovery (PV ), optimal design selection (excluding PV) (OD), and structure building for interval table (IT ), position array (PA), and filter sketches (FS).We see that the optimal design selection using Algorithm 3 is fast (i.e.3-41 microseconds) in all cases.Compared to sort and hash, sample reduces the cost of PV by using a small sample.It also avoids the sorting cost in PA for intervals whose position array is not stored (i.e.,  < 100%).Hash works well for data with duplicates, but incurs prohibitively high hashing cost for uniform data because the hash table is much larger than the CPU cache.Overall, contrast, in queries Q2.1-Q4.3,scan accounts for only a small fraction of the total query time.As a result, the index enhanced solutions have similar performance as MonetDB.Overall, if we focus on the scan time, we see that MonetDB+Cabin improves the scan performance in all SSB queries by a factor of 5.2-560x compared with MonetDB and 1.1-6.7xcompared with MonetDB+BinDex.

DISCUSSION
PAX Layout.In this work, Cabin is constructed on a whole data column.In addition to the column layout, PAX is another popular data layout in OLAP systems [4].In PAX, a table is divided into row groups, and each row group employs the column layout.To support PAX, we can simply build a Cabin per row group, and use the per-row-group Cabins to accelerate scans.Data Updates.For deletes, we can record the deleted rows in a delete bit vector.Then, a scan computes the bit-wise AND of the scan result bit vector and the delete bit vector.For inserts, we can employ the main+delta approach.We build Cabin indices for the main data, and append newly inserted values to the delta data.When merging the main and delta into the new main data, we re-construct Cabin indices.In this way, a scan consists of the Cabin-enhanced scan on the main data, and a plain scan on the delta data.An update can be supported as a delete followed by an insert.
Support for Strings.Scan indices, including Cabin, mostly focus on numeric values and filter predicates (e.g., <, >, ≤, ≥, =, ≠, or BETWEEN) that specify value ranges.In many cases, strings can be encoded as numeric values and effectively supported for such predicates.However, string matching operations (e.g., LIKE) cannot be easily supported by scan indices.There is no clear sort order for string matching operations, but the sort order is the basis for the design of many scan indices.It would be interesting to study how to combine inverted indices and scan indices to improve OLAP queries with string matching operations.
Optimization Based on Query History.In this work, our Cabin design does not rely on any query history.Here, we consider potential optimizations if the query history is available.(1) Index selection: We can identify frequent queries and analyze the importance of scans to query performance.Then, we can choose a subset of columns to build Cabin indices in order to maximize the performance benefit given memory space budget.(2) Query-aware selective position array: Instead of randomly selecting intervals, we can choose which intervals to remove position arrays based on the distribution of query predicate values.

CONCLUSION
In this paper, we propose and evaluate a novel scan index, Cabin.Extensive experiments show that Cabin achieves better time-space tradeoff than state-of-the-art scan solutions.Cabin is a promising scan index for main-memory analytical databases.

Figure 2
Figure 2(c) depicts Cabin.It is composed of three structures: • Filter sketches: According to Observation 1, we can use -bit sketch codes (e.g., =3 in Figure 2(c)) to represent  = 2  − 2 (e.g., 6) intervals.Then we divide all the intervals into groups of  intervals each, and build  filter sketches for every group of  intervals.The -th interval in group  is denoted as  , . ,0 and  ,−1 are two special virtual intervals, representing values less than and greater than the entire group   .In this way,  ,0 ,  ,1 , ...,  ,−1 cover the full value range.(This is important for supporting selective position array.)Note that while Figure2(c) draws only one group of intervals due to paper space constraint, there are usually multiple interval groups in a Cabin.• Selective position array: According to Observation 2, we can remove the portion of the position array for a subset of intervals to save space.As shown in Figure2(c), the shaded portion of the selective position array is not physically stored.However, an interval without physical position array may be visited by a scan.In such cases, we cannot perform binary search of the predicate value in the position array.Instead, Cabin relies on the filter sketches for evaluating the predicate.Since the filter sketches of an interval group cover the full range of values, they can be used in a similar fashion to Column Sketches.• (Data-aware) interval table: As shown in Figure2(c), the interval table keeps the metadata of each interval in each group, including the value range, the sketch code word, the number of records in the interval, its start location in the selective position array, and a flag used by selective position array and data-aware intervals to indicate the interval types.To build the interval table, it is natural to divide the value range into even-sized intervals.However, according to Observation 3, even-sized intervals may be sub-optimal when there are a large number of duplicate values.In such cases, Cabin judiciously makes data-aware selection of intervals to further optimize the scan.In the following subsections, we describe three Cabin designs addressing the three observations progressively:• Cabin  : A design with filter sketches, even-sized intervals, and fully physical position array (cf.Section 3.2) • Cabin  : Cabin  with selective position array (cf.Section 3.3) • Cabin  : Cabin  with data-aware intervals (cf.Section 3.4)
Figure 8(b) shows the space-time scatter plot when there are 100 distinct values.In both cases, the data column contains popular values that span entire value intervals.Our technique of data-aware intervals Proc.ACM Manag.Data, Vol. 2, No. 1 (SIGMOD), Article 57.Publication date: February 2024.

Figure 9 (
a) and (b) show the time-space trade-off of scan solutions for 32-bit floating point data that follow skewed normal distribution and pareto distribution, respectively.(Please note that ByteSlice does not support floating point data.)

Table 1 .
Comparing scan index structures.

Range Code #Records Index in Selective Position Array 𝑥
0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  ↦  2  1  0 number of SIMD operations by 5x.Scan Algorithm for "≤" Operator.The scan algorithm of Cabin F for a predicate " ≤ " is listed in Algorithm 1:1.Prepare (Line 2-4).Cabin performs a binary search for the predicate value  in the interval table.The target interval  , contains  records and starts at  [].Then, Cabin searches for  between  [] and  [+-1] in the selective position array.The split point is   .Note that data associated with  [0], ...,  [  − 1] satisfy the predicate " ≤ ". 2. Find Closest Matching Intervals (Line 5-8).To compute the result bit vector, there are two options: [14,19,21,28,36]it vector representing all intervals <  , (i.e.= { ,0 ..., −1 }) then flip the bits of the records to the left of   ; b) compute the bit vector representing   = { ,0 ..., } then flip the bits of the unwanted records including and to the right of   .Cabin chooses the option with fewer bit flips.3.Generate Draft Bit Vector (Line 9-10).This step calls the MLO computation function for every   (i.e, vector length, e.g., 256 in Intel AVX2) bit segment in filter sketches.Our implementation pre-generates   (,   ) function using SIMD operations for all 0 ≤  ≤ 2  − 1 and 2 ≤  ≤ 9.(This is sufficient since the optimal  is often small.)Westorethefunction pointers in a table and invoke the relevant functions for ℎ__.(Alternatively,SIMD code can be generated online if the database system supports query compilation[14,19,21,28,36].) 4. Refine Bit Vector (Line 11-13).Finally, Cabin performs the bit flips to account for the difference between the interval boundary and the split point.Step 2 has already recorded the incorrect records  [  ], ...,  [  − 1].Cabin simply flips the corresponding bits in a loop.We use software prefetching to accelerate the random memory accesses caused by the bit flips.

Table 2 .
Terms used in cost analysis.  time to sequentially write a byte | , | number of records in  ,   time for a random read Prepare ( ℎ ).Cabin F searches the interval table containing  intervals, and then the target interval with an average   records.Hence, this step takes  (log  ) time. 2. Find Closest Matching Intervals (   ).Step 2 involves  (1) comparison and assignment operations.3. Generate Draft Bit Vector (  ).This step reads a group of   -bit filter sketches, performs an average  SIMD logical operations for every   code words, and writes a  -bit result vector.Hence,   = max{  8   ,      ,  8   } where   ,   , and   are the time to sequentially read a byte, perform a SIMD logical operation, and sequentially write a byte, respectively.The max takes into consideration the overlapping of computation and memory accesses [15, 33]. 4. Refine Bit Vector (    ).As Step 2 finds the closer interval end to the split point, the number of bit flips is at most half of the interval.On average, 1 4 of the positions need to be corrected.For each position, Step 4 reads the rowIDs in the position array then flips the bit in the draft result.Hence,     = | , | 4 (  8   +   ), where | , | is the size of interval  , ,  is the number of bits of rowID,   is the time for a random write.It follows from the above discussion:   ( , ) =  ℎ +    +   +     =  (log  ) + max{  8   ,      ,  8   } + | , |(  32   + 1 4   )   , the scan time of Cabin FS , also consists of four components.The costs of the first three steps are similar to those of Cabin  except that it does not search the interval.The main difference is the cost of the refine step.Since the portion of the selective position array is not stored for the interval, Cabin FS computes the equal-code positions with SIMD logical operations, and reads all | , | base data values in interval  , to evaluate the predicate.On average, half the values satisfy the predicate, and the scan flips | , | 2 bits in the draft result.Hence,     =      + | , |(  + 1 2   ), where   is the time for a random read.Hence, we have:   ( , ) =  ℎ +    +   +     =  (log ) + max{  8   ,      ,  8   } +      + | , |(  + 1 2   ) Finally, we compute   for popular values.The scan searches the interval table in  (log ) time.Then it computes the result.There are two cases.For a skew group, the scan returns the filter sketches with negligible cost.For a skew interval, the scan generates and returns the draft bit vector in   time.  ( , ) =  ℎ +    +   =  (log ) + max{  8   ,      ,  8   },   = 2,  (log ) + max{  8   ,      ,  8   },   = 3. Space Overhead.We first consider all equal-sized intervals: • Interval table (   ).It stores meta information of intervals.Its space cost is negligible.• Filter sketches (   ).There are  interval groups.Each group has   -bit vectors.Hence,    = Selective position array (  ).The total size is   8 bytes.Since  portion is physically stored,   =    8 bytes.Thus, we have the following for equal-sized intervals:   =    +    +   = Optimal design selection algorithm.Input : Column data  1... , space budget  Output : Sketch code width , number of groups , stored proportion of selective position array  1 Function DesignSelection( , ) |    ( , ) + ∑︁  0 | , |    ( , ) + ∑︁   | , |    ( , )   , the scan time of Cabin F , consists of four components: 1. 6 for   ≤  ≤   do 7  =  • (2  − 2); 8   = DataAwareInterval( , , ); 9   = EqualInterval( ,   , ); 10  = ComputeSP(N,   ,   , , ); 11 ( 0 ,  1 ) = GetSelPosArray(  , ); 12 Compute   given  0 ,  1 , and   ; 13 if   <   then 14   =   ; update (, , ); 15 return (, , );

Table 3 .
Build time of scan solutions (in seconds).