Batchman and Robin: Batched and Non-batched Branching for Interactive ZK

Vector Oblivious Linear Evaluation (VOLE) supports fast and scalable interactive Zero-Knowledge (ZK) proofs. Despite recent improvements to VOLE-based ZK, compiling proof statements to a control-flow oblivious form (e.g., a circuit) continues to lead to expensive proofs. One useful setting where this inefficiency stands out is when the statement is a disjunction of clauses \mathcalL _1 łor \cdots łor \mathcalL _B. Typically, ZK requires paying the price to handle all B branches. Prior works have shown how to avoid this price in communication, but not in computation. Our main result, \mathsfBatchman , is asymptotically and concretely efficient VOLE-based ZK for batched disjunctions, i.e. statements containing R repetitions of the same disjunction. This is crucial for, e.g., emulating CPU steps in ZK. Our prover and verifier complexity is only \bigO(RB+R|\C|+B|\C|), where |\C| is the maximum circuit size of the B branches. Prior works' computation scales in RB|\C|. For non-batched disjunctions, we also construct a VOLE-based ZK protocol, \mathsfRobin , which is (only) communication efficient. For small fields and for statistical security parameter łambda, this protocol's communication improves over the previous state of the art (\mathsfMac'n'Cheese , Baum et al., CRYPTO'21) by up to factor łambda. Our implementation outperforms prior state of the art. E.g., we achieve up to 6× improvement over \mathsfMac'n'Cheese (Boolean, single disjunction), and for arithmetic batched disjunctions our experiments show we improve over \mathsfQuickSilver (Yang et al., CCS'21) by up to 70× and over \mathsfAntMan (Weng et al., CCS'22) by up to 36×.


Introduction
Zero Knowledge (ZK) proofs [GMR85] allow a prover P to demonstrate to a verifier V the validity of a given statement while revealing nothing beyond the validity of the statement.The past decade has seen an explosion in the design, implementation and deployment of concretely efficient zeroknowledge proofs systems.
Large overheads of P and V remain a bottleneck in scaling zero-knowledge to very large statements.One major overhead comes from complex control flow, which, explicitly or implicitly, includes repeated evaluation of disjunctions.Examples include complex statements like 'this program has a bug' [HK20b] or even (the more complex) 'this program does not have a bug' [LAH + 22].
We focus on minimizing total end-to-end proof time, which includes communication and total computation by both P and V.
While VOLE-based ZK is fast, its costs (communication, P and V computation) still scale linearly in the size of the proof statement. 1It is interesting to exploit statement structure (e.g., disjunctions) to achieve further improvement with the ultimate goal of costs that grow sublinearly (with small constants) in the proof statement.
ZK proofs of disjunctions.Seeking improved performance and motivated by the structure of real-world programs, prior works [Kol18, HK20b, BMRS21, GGHAK22, GHAKS23] specifically optimized for proofs of disjunctive statements of the form C 1 (w) = 0 ∨ • • • ∨ C B (w) = 0 for B different subcircuits referred to as branches.Their underlying techniques are often referred as stacking, following the notation introduced by [HK20b].For disjunctions, because P only needs to demonstrate the truth of one branch, it is possible to design custom systems that achieve up to factor B improvement.
Our first contribution, Robin, (see Section 1.1) is a protocol that improves cost of disjunctive statements in VOLE-based ZK.
The crucial application of such statements is the emulation of a CPU inside ZK.Indeed, each step of a basic CPU executes one out of a possibly large set of instruction types, and this is repeated many times until the program halts.ZK systems that emulate a CPU are interesting, because they enable end users to express complex proof statements as programs written in commodity programming languages, see e.g.[HYDK21].More generally, a program can be compiled into a single large forloop over switch statement executing one of many (hundreds or thousands) of straight-line program segments.This is called loop coalescing [GF95]; loop coalescing is known to be useful for fast RAM-based ZKP [WSR + 15].
Concretely efficient ZK systems (sublinear in computation and communication) for batched disjunctions have not been considered.
Our second -and most exciting -contribution, Batchman, is an interactive VOLEbased ZKP system that efficiently handles batched disjunctions.The surprising property of this proof system is that, as compared to naïve handling, it achieves not only factor B communication improvement, but also up to factor B computation improvement for both P and V. Thus, our protocol is sublinear in the proof statement.While our total end-to-end runtime scales with the number of branches B and repetitions R, it crucially does not scale in the product RB.This enables CPU-emulation-based ZK operating over large and expressive instruction sets.
Batching zero-knowledge proofs has proven an important step towards in determining the feasibility of full-fledged zero-knowledge.Understanding the complexity of disjunctive statements has also been of theoretic interest and traces back to the work of Feige and Shamir [FS90] and Cramer, Damgård and Schoenmakers [CDS94] for the design of witness indistinguishable proofs and efficient Σ-protocols respectively.
An optimization for Batchman via recent ZK ROM [YH23].We note that a straightforward optimization for our Batchman protocol can be achieved by directly plugging in the advanced ZK Read-Only Memory (ROM) built in [YH23].This can improve the communication complexity of Batchman to O(B + R|C|).In fact, we implicitly build a (worse) ZK ROM in this work.Any future optimizations to ZK ROM can easily be plugged into Batchman to achieve further improvement.

Our Contribution
As mentioned above, we construct two VOLE-based ZK protocols: In our batched protocol, except for a one-time additive B|C| cost, P's and V's computation costs are independent of the number of disjunctions.In comparison, "flattening" out the circuit would result in computational complexity proportional to RB|C|.Our protocols are concretely performant.For instance, Robin scales in branches up to 6× better than Mac ′ n ′ Cheese [BMRS21] when |C| = 10 9 , and demonstrates up to 16× improvement over QuickSilver [YSWW21] when B = 100.Batchman demonstrates up to 36× improvement over AntMan [WYY + 22] when B = 64 and R = 1024, and up to 70× improvement over QuickSilver [YSWW21] when B = R = 400.We provide a summary of our comparison to prior work in Table 1; see detailed comparison to prior work in Sections 1.2 and 7.
A bird's-eye view of our protocols.We remark that achieving computational cost sublinear in RB|C| is possible when we wish to evaluate the same disjunctive statement R > 1 times, if we are allowed non-black-box access to some underlying cryptographic primitive.Suppose P and V in a pre-processing step compute the hash of the description of each of the B branches under a collision-resistant hash function.Then, for each instance of the disjunction P includes in her witness the circuit description of the active branch and proves via a universal circuit that the circuit on some input witness returns 0 and that the hash of the circuit description belongs to the set of precomputed hashes.The complexity is Õ(B|C|) for the first instance (to compute B hashes) and Õ(B + |C|) thereafter.
Such an approach is impractical due to its use of universal circuits and its non-black-box use of a hash function.
Seeking concretely efficient constructions, we restrict ourselves to black-box use of underlying primitives only.Surprisingly, in the same batched setting, we design an efficient construction that builds on the high level concept of "checking circuit hashes", but our construction achieves asymptotic complexity O(RB + R|C| + B|C|) while making only black-box use of VOLE (and while using no other cryptographic primitives).In short, our approach shows that the "hash" of each branch can be determined by a random challenge that is chosen by V after P has committed to her witness.To compute and check these "hashes", each party computes simple linear combinations of field elements.See Section 3 for details.

Related Work
VOLE-based interactive zero-knowledge protocols.Consider a fan-in-2 circuit C with |C| multiplication/addition gates over some field.[DIO21] achieved 1 field element communication per multiplication gate based on a technique called Line-Point Zero Knowledge (LPZK).[YSWW21] further improved LPZK and implemented the technique.Our work handles multiplication gates by directly applying the LPZK technique, as well as [YSWW21]'s improvement for proving inner products.Our implementation (see Section 7) builds on [YSWW21]'s open source repo [WMK16].
[DILO22] improved LPZK communication to 0.5 field element per multiplication gate at the cost of increased computation and requiring random oracle.Concrete performance of [DILO22] is similar to [YSWW21]; we build on [YSWW21] for simplicity.
[WYY + 22] for the first time achieved a VOLE-based ZK system with sublinear communication and achieved communication cost O(|C| 3/4 ).While the approach remains quite efficient, its performance is not strictly better than prior work, because it achieves its improved communication at the cost of computation.The technique performs polynomial interpolation, incurring factor O(log |C|) overhead, and it also employs relatively expensive additively homomorphic encryption.[WYY + 22] consider batching, but not batched disjunctions; we compare with them in Section 7.
ZK disjunctions.A line of works [HK20b, BMRS21, GGHAK22, GHAKS23] augments ZK proofs with efficient handling for disjunctions.Consider B circuits C i∈[B] each with the same number of inputs/multiplications, and suppose P holds a single witness for C a∈[B] (the active branch).These works achieve communication that scales with |C| rather than B|C|.Such works say that they "stack" the branches, following terminology introduced by [HK20b].
Most related to our work, [BMRS21] was the first (and the only) work to stack in the VOLEbased ZK setting.[BMRS21] implements multiplication gates with a custom protocol, and it is incompatible with the LPZK technique.Their multiplication procedure communicates 2 extra extension field elements per multiplication.Our protocols are compatible with LPZK.Our protocols communicate extra 2 field elements (not extension field elements) per multiplication, and our work is the first to consider batched disjunctions.
Even at a high level, our approach seems quite different from these prior approaches.In prior works, P proves satisfiability of each branch (thus paying computation scaling with B), but even honest P can "cheat" on each inactive branch.For example, [HK20b] allows P to learn cryptographic seeds used to garble the inactive branch circuits.Our approach instead achieves branching by leveraging (1) simple properties of VOLE correlations and (2) a random challenge from V to "compress" the description of each branch.
The RAM model of computation is relevant to the batched disjunctions setting.Indeed, because of constant RAM access cost in ZK [FKL + 21, DdSGOTV22], for batch size R ≥ B, RAM-ZK can be used to achieve batched disjunctions.By simply structuring the proof statement as a RAM program, loading the program (of size B|C|) into the RAM memory, and running the RAM-ZK, the proof will naturally terminate in time O(B|C| + R|C|).
RAM-based ZK is not competitive with our batched protocol for two reasons.First, our approach demonstrates a theoretical advantage.Suppose the batch is relatively small, i.e.R = o(B).In this case, the RAM approach is less appealing, since it is necessary to load the program into memory, immediately incurring O(B|C|) cost.At the same time, our communication cost is independent of the quantity O(B|C|), and so it works well in this setting.More importantly, our approach elides the expensive machinery required for RAM emulation and is concretely performant.Indeed, our motivating application is the acceleration of such RAM-based proofs, so our low constant costs are Functionality F R,B ZK Upon receiving (prove, C1, . . ., CB, w1, . . ., wR, a1, . . ., aR) from prover P and (verify, C1, . . ., CB) from verifier V: • If for all i ∈ [R] it holds that Ca i (wi) = 0, then output (true) to V; else output (false) to V.
• The prover is P. We refer to P by she, her, hers...
• The verifier is V.We refer to V by he, him, his...
• x ≜ y denotes that x is defined as y.
• We denote that x is uniformly drawn from a set S by x ∈ $ S.
• We denote column vectors by bold lower-case letters (e.g., a), where a i (or a[i]) denotes the ith component of a (starting from 1) and a[i : j] the subvector [a i , . . ., a j ] T .We use glue(•) to stitch column vectors (e.g., glue(a, b) ≜ [a T |b T ] T ).
• We denote matrices by bold upper-case letters (e.g., A), where A(i) denotes the ith row vector of A (starting from 1) and A[i] denotes the ith column vector of A (starting from 1).A(i)[j] denotes jth value in ith row.
• We prove batches of disjunctions.We call each member of the batch a repetition.B denotes the number of branches and R denotes the number of repetitions.
• We denote a finite field of size p by F p where p ≥ 2 is a prime or a power of a prime.Extension fields are defined and denoted in the standard way.
Functionality F p,q sVOLE Consider a base field Fp and an extension field Fpq .Functionality interacts with P, V and the adversary A as follows: Initialize.Upon receiving (init) from P and V, if V is honest, sample ∆ ∈ $ Fpq , else receive ∆ from A. Store ∆ and send it to V. Ignore subsequent (init).
Extend.Upon receiving (extend, n) from P and V, do the following: • Send (u, w) to P and v to V.

Security Model
We formalize our protocols under the universally composable (UC) framework [Can01].We use UC to formalize our protocols and to prove security in the presence of a malicious, static adversary.The functionality F R,B ZK (C.f., Figure 1) is used to realize a zero-knowledge proof (of knowledge) for R-repetitions disjunction of B circuits.
ZK is the ZK functionality for a single disjunction.Looking ahead, our protocol for single disjunction implements F 1,B ZK (for B ∈ Z + ) and our protocol for batched disjunctions implements F R,B ZK (for R, B ∈ Z + ).See Appendix A.1 for UC framework background.
In VOLE-based ZK, VOLE correlations allow P to commit to wire values using informationtheoretic MACs (IT-MACs).Let x ∈ F be a field element known to P (e.g., part of her witness).An IT-MAC commitment to x is a pair of values held respectively by P and V. Specifically, V holds a key k ∈ $ F and P holds m = k − x • ∆, where ∆ ∈ $ F is a key which is global to the entire proof, known to V, and hidden from P. We denote a commitment to x under global key ∆ by writing [x] ∆ , where ∆ will be omitted if it is clear from the context.I.e., [x] ∆ is a pair of tuples (m x , x), held by P, and (k x , ∆), held by V. We use [x] ∆ to denote IT-MACs of vector x.Note that P can efficiently open a commitment [x] ∆ by sending (m x , x).
An IT-MAC [x] ∆ has the following notable features: • Hiding: V' s share the (k x , ∆) is independent of the secret x, so the share trivially hides x.
• Binding: Malicious P cannot cheat and open a commitment [x] ∆ to some x ′ ̸ = x.Indeed, forging a suitable opening is as hard as guessing ∆.Note that (m x , x) conveys no information about ∆.
• Linear homomorphism: The VOLE functionality allows P and V to construct n IT-MAC commitments, each to a uniformly random value [r] ∆ where r ∈ $ F. A random commitment [r] ∆ can be easily translated into a commitment [x] ∆ where x is a value chosen by P: P simply sends to V the single field element (x − r), and V correspondingly locally shifts his key by (x − r) • ∆.Thus, to commit to n field elements, P and V first execute VOLE, and then P transmits n • ⌈log |F|⌉ bits.
Field extension.When the ZK statement is defined over a small field F p (e.g., Boolean), we need to use IT-MACs defined over an extension field F p q to ensure that ∆ cannot be easily guessed.In this case, it suffices to consider random IT-MACs [r] ∆ where r is drawn from the base field F p because r is only used to mask x ∈ F p .There exists a VOLE variant that works over F p q , but generates IT-MACs of such r values from the subfield F p .This variant is called subfield VOLE.I.e., an IT-MAC [r] ∆ generated by subfield VOLE will have m r , k r , ∆ ∈ F p q but r ∈ F p .
It is sometimes necessary to mix VOLE and subfield VOLE correlations in a single proof.This is easy: we can linearly combine q subfield VOLE correlations into 1 VOLE correlation over the extension field F p q .This incurs factor q blowup.

LPZK [DIO21] and QuickSilver [YSWW21]
VOLE-based ZK works in the "commit-and-prove" paradigm: P commits to her extended witness with IT-MACs, and later proves to V that committed values are consistent with the proof statement.
Consider a proof statement encoded as an arithmetic circuit C, and let P hold a witness w.To prove C(w) = 0, P first computes her extended witness w ext , which consists of w along with each multiplication gate's output value upon evaluating C(w).The parties then construct commitments to w ext , as described above.From here, P and V can use the linear homomorphism property of IT-MACs to construct commitments to the output value of each addition gate.
Note that at this point, P is now committed to a particular value for every wire in the circuit.Two tasks remain: • P must prove to V that the committed value of the output wire is 0. This is achieved simply by opening.
• For each multiplication gate, P must prove that the gate's committed input values indeed multiply to the gate's committed output value.
Previous VOLE-based ZKs mainly differ in the way they handle multiplication gates.Stateof-the-art VOLE-based ZK [DIO21,YSWW21] handles multiplication gates via a technique called line-point zero-knowledge (LPZK).At a high level, LPZK [DIO21] satisfy the multiplication relation o i = l i • r i by utilizing (1) another random IT-MAC and (2) algebra over IT-MAC shares.The technique can be achieved at the cost of (1) V sending a random challenge and (2) P sending 2 field elements.Each party computes O(n) field operations.We denote the procedure to prove multiplications for IT-MACs as ), which we use as a black-box.LPZK has (n + 2)/p q soundness error and information-theoretic security in the F p,q sVOLE -hybrid model [YSWW21]3 .
QuickSilver [YSWW21] subsequently generalized LPZK to efficiently handle arbitrary polynomials over committed values.Our protocols use this trick for proving the inner-product of IT-MACs.Namely, given 2m IT-MACs ([x 1 ] ∆ , . . ., [x m ] ∆ ) and ([y 1 ] ∆ , . . ., [y m ] ∆ ), QuickSilver shows how to efficiently prove that x 1 y 1 + . . .+ x m y m = 0.The proof requires O(1) communication and O(m) computation.Further, incorporating a random challenge from V, k inner-product proofs can be batched, preserving O(1) communication.We denote the procedure to prove k-batched innerproducts for IT-MACs as QS({[x ).We will use it as a black-box subprotocol.This sub-proof, as shown by [YSWW21], is zero-knowledge with (k + 2)/p q soundness error and information-theoretic security in the F p,q sVOLE -hybrid model 3 .We defer additional details on LPZK and QS to Appendix A.2.

Technical Overview
In this section, we present our techniques with sufficient detail to understand our contribution.Our ZK protocol considers standard arithmetic circuits with fan-in-2 gates.For ease of presentation, throughout this section, we consider circuits defined over a prime field F p where p is large enough to achieve the desired soundness (e.g., p = 2 61 − 1) without using VOLE with an extension field.
Recall that our goal is to extend VOLE-based ZK such that it can efficiently handle proofs of (batched) disjunctive statements.
Consider B circuits C 1 , . . ., C B (each called a branch) with the same number of input wires and multiplication gates, which is padded if needed.To begin, suppose P wishes to prove a single disjunction (we will discuss batching shortly).I.e., P wishes to prove Note that basic VOLE-based ZK (e.g., [YSWW21]) scales with the number of branches B, both in communication and computation.The primary source of this cost is simply the commitment of P's extended witness, which is linear in the total number of multiplication gates.
In our approach, P commits to a much shorter string whose length scales only with the number of multiplications (and inputs) in a single branch.This reduction in the size of the committed string is the source of our improvement.
In slightly more detail, P commits to a modified version of the extended witness w ext .In addition to the inputs of the active circuit, w ext includes input/output wire values of each multiplication gate of the active branch4 .We use out(w ext ) (resp.left(w ext ), right(w ext )) to denote the vector of multiplication-gate outputs (resp.mult-gate left inputs, mult-gate right inputs) taken from w ext .From here, P proves that the committed multiplication inputs/outputs indeed respect multiplication.Namely, P proves out(w ext ) = left(w ext ) • right(w ext ) where • denotes the elementwise product.This check is performed using the techniques of prior work (LPZK).Note that the number of checks does not scale with the number of branches.
So far, P has simply introduced and committed to a length-n in vector of inputs and a length-n × vector of tuples, each of which respects multiplication.The remaining task is to force P to choose this vector such that it satisfies the structure of some active branch C a .That is, P must prove that w ext respects the topology of branch C a , as well as the linear constraints implied by C a 's addition gates.As we will describe in detail later, we can enforce such constraints by introducing public matrices M i of size O(|C|) × O(|C|), encoding the topology of C i a-la adjacency matrix.For each branch C i with matrix M i , consider the following crucial equation: Equation ( 1) has two notable properties: • If P is honest and holds a witness that satisfies active branch a, Equation (1) will hold for branch a.
• If P attempts to cheat and does not have a valid witness, w.h.p.Equation (1) will not hold for any i.
We defer further details on the structure of these matrices until Section 3.3.It is worth noting that although the size of these matrices is O(|C| 2 ), we will demonstrate that all relevant operations we used on these matrices can be computed in time O(|C|).
Terminology.Our approach centers on the manipulation of matrices M i which encode the topology of circuits C i .We find it helpful to introduce terminology for these matrices.
• We refer to each matrix M as a topology matrix.M is a matrix of dimension O(|C|) × O(|C|).
• We will allow V to select random challenge vectors s, and we will consider products s T × M .
We refer to the resulting length O(|C|) vector as a compressed topology vector.
• Additionally, we will right multiply compressed topologies by vectors.The result of such a multiplication is a scalar that we refer to as a compressed topology token.

Robin: Single Disjunction Protocol
In the single instance setting, we wish to improve communication.Recall that we consider statements of the form (C 1 (w) = 0 ∨ • • • ∨ C B (w) = 0).Our goal is to achieve communication that scales with B + |C|, not B|C|, while preserving the low computation cost of the basic VOLE-based ZK.
Our key insight is that V can challenge P with a random vector s after P commits to w ext .Both parties can then use the IT-MAC commitment [w ext ] ∆ to homomorphically (i.e., without further communication) derive B IT-MAC commitments of the following compressed topology tokens: Crucially, we prove that from the properties of Equation (1), these B IT-MAC commitments have the following induced properties: • If P is honest and commits a witness that satisfies active branch a, [s T × M a × w ext ] ∆ will always be [0] ∆ .
• If cheating P does not have a valid witness, then with overwhelming probability, none of these values will be [0] ∆ .
To complete the proof, P simply needs to demonstrate that one of these committed tokens is a 0. We achieve this in a direct way: we run a (much smaller) VOLE-based ZK proof demonstrating that the product of the B elements is 0.
All in all, P demonstrates: There exists an extended witnesses w ext s.t. for a random challenge s, the following holds: multiplication check Note that the order of quantifiers in the above statement is crucial, implying the order in which the proof proceeds.In short, the full proof proceeds as follows: 1. P commits to the extended witness w ext .
2. P and V check that multiplication wires are properly committed by using the existing LPZK technique.
3. V sends to P the random challenge vector s.

P and
5. P and V use VOLE-based ZK to prove that the product of these B commitments is 0.

Batchman: Batched Disjunctions Protocol
In the batched setting, we wish to improve not only communication, but also computation.Recall, we consider the statement (C 1 (w j ) = 0 ∨ • • • ∨ C B (w j ) = 0) on R different witnesses. 5Our goal is to achieve computation that scales with B + R, not BR.As a first attempt, one could try simply applying our single instance approach R times; this fails, because computing each commitment [s T × M i × w ext ] ∆ requires O(|C|) field operations, and so ultimately this attempt uses O(RB|C|) field operations.
As a second attempt, one could use RAM-based ZKs.While this works for large R, it does not match our asymptotics for small R and is concretely expensive; see Section 1.2.
Our batched approach relies on three key insights: 1. P knows the active branch C a /matrix M a for each repetition.
2. It is safe to re-use the challenge vector s across all R instances.
3. The compressed topology vector s T × M a is small, having length only O(|C|) field elements.
Thus, for each repetition j ∈ [R], we can require that P commits to her extended witness w (j) ext and to her desired branch a (j) .In particular, if P is honest, she will commit to the compressed topology vector of the active branch as [cv (j) ≜ s T × M a (j) ] ∆ .From here, the parties use a regular VOLE-based ZK proof (QS) to show that P's committed witness respects the committed compressed topology vector.Namely, they check: Crucially, the computation cost of this inner product check does not scale with the number of branches B, because P directly chooses and commits to only the active branch.Suppose that a cheating P does not have a witness for some repetition j.Based on our reasoning in Section 3.1, passing the above check is negligibly likely, if cv (j) is equal to the compressed topology vector of some branch.Of course, it might be the case that cheating P committed to some vector cv (j) which is not equal to any branch's compressed topology To repair this, we add a step to validate that P indeed committed to the compressed topology of some branch.In particular, we allow V to issue a second challenge vector t, and then the parties once and for all precompute the following compressed topology tokens: Computing these values takes time proportional to B, but not proportional to R as each value is computed exactly once; once computed, P and V re-use these values in each of the R batched proof instances.
The above validation will catch a cheating P with overwhelming probability (in the size of the field F).More precisely, we observe (and prove . Furthermore, for each j ∈ [R], parties already hold [cv (j) ] ∆ , so they can locally compute [(cv (j) ) T × t] ∆ and perform a regular VOLE-based ZK proof to show: Each token ct i is a single field element, so this check is efficient.
All in all, P demonstrates: There exist R extended witnesses w (1) ext , . . ., w (R) ext s.t. for a random challenge s there exist R vectors cv (1) , . . ., cv (R) s.t. for a random challenge t, the following holds: The order of quantifiers in the above statement is crucial, implying the order in which the proof proceeds.In short, the full batched proof proceeds as follows: • P commits to each extended witness w (j) ext .
• P and V check multiplication wires are properly committed by using the existing LPZK technique.
• V sends to P the random challenge vector s.
• P commits to each compressed topology cv (j) .
• P and V check each inner-product (cv (j) ) T × w ext is equal to zero by using the existing QS technique.
• V sends to P the random challenge vector t.
• P and V locally compute s T × M i × t for each i ∈ [B].
• P and V use VOLE-based ZK to prove that each committed vector (cv (j) ) T × t is a valid compressed topology token.

Topology Matrices
We now discuss how we construct and use branch-specific public topology matrices M .Recall, these matrices allow V to verify that P's extended witness indeed satisfies the structure of some branch of C 1 , . . ., C B .This verification is achieved by Equation (1).Recall, our extended witness w ext includes (1) P's witness w and, for each multiplication gate in the active branch, (2) its output wire and (3) its two input wires.
Consider a branch C, and suppose that we remove each multiplication gate from C. Whenever we remove a multiplication gate, we replace its input and output wires with inputs to C. What remains is a skeleton of the circuit containing only addition gates that expresses a linear relationship on the extended witness (and C's output).It is convenient to encode this linear relationship as a matrix M ∈ F (2C × +1)×(C in +3C × +1) , and we refer to this matrix as a topology matrix.Figure 3 shows an example.
Note that w ext is a valid extended witness for C if and only if: 1. Multiplication gates in the w ext are formed correctly.
The above requirements imply that, for an invalid extended witness w ext , if Item 1 is satisfied, Item 2 will not be satisfied.This is precisely our Equation (1) and associated properties with one caveat: we did not append 1 to w ext .This can be trivially fixed, because P and V can locally generate shares of [1] ∆ .
Efficient operations on topology matrices.Recall that we left multiply topology matrices M by vectors s T : we compute s T × M .
Computed naïvely, the above multiplication is expensive.Indeed, M can be dense, due to unlimited fan-out from addition gates.Therefore, storing M and naïvely computing the product will incur O(|C| 2 ) overhead, far exceeding our asymptotic budget.
Perhaps surprisingly, given the gate-by-gate representation of C, this multiplication can be computed in time O(|C|) with our technique "evaluating C backwards" -see Section 4. We name the corresponding algorithm MUL LEFT .MUL LEFT never explicitly computes M .Thus, the topology matrix M is merely an analysis tool, and our protocols work entirely with efficient gate-by-gate circuit representations.In other words, it suffices to think of circuits as topology matrices, while in reality all algorithms operate on compact gate-by-gate representations.
and its corresponding topology matrix (right).Note, the shaded portion of the matrix is dense.

Formalizing Topology Matrices
In this section, we formalize topology matrices, a tool used to prove the correctness and security of our approach; see Section 3. We also give an algorithm that allows efficient vector-matrix multiplication on topology matrices.
Linear constraint on a wire.Consider a wire w k in a circuit C. The wire w k can be defined as a linear combination of input wires of C and output wires of all multiplication gates.We call this linear combination the linear constraint on w k .Following this, a circuit's linear constraints can be captured by its associated topology matrix (see Figure 3 for an example): Definition 1 (Topology Matrix).Let C denote a circuit over some field F such that C has n in input wires and n × multiplication gates.The topology matrix associated with C is a (2n × + 1) × (n in + 3n × + 1) matrix over F defined as follows.
Let aux ≜ (in 1 , . . ., in n in , ℓ 1 , . . ., ℓ n × , r 1 , . . ., r n × , o 1 , . . ., o n × , 1) T denote a vector of circuit metadata.Here, in k represents the kth input, ℓ k (resp.r k , o k ) represents the left (resp.right, output) wire of the kth multiplication gate, and 1 is the multiplicative identity of F. The rows of the topology matrix M are: 2. Right wires: For the second n × rows, for each k 3. Circuit output: For the last row M (2n × + 1), M (2n × + 1) × aux is the linear constraint on the output of the circuit.E.g., Figure 4: A circuit's induced DAG.In the topology matrix of this circuit, the last row defines out ≜ 6in 2 + 7in 3 + 6o 1 .In this DAG, there are 6 paths from in 2 to out, 7 paths from in 3 to out, and 6 paths from o 1 to out.E.g., from o 1 , there are 5 paths (dashed) passing through the scale gate and 1 path (dotted) passing through the addition gate + 4 , so there are 6 paths in total.The topology matrix reflects these numbers of paths.Since there is no offset gate, the unit vertex is isolated.
Item 3 can be naturally extended to capture circuits with multiple outputs.Note that for aux to be a valid extended witness, M × aux must be the all zeros vector.
Left multiplication for topology matrices.Recall, our P and V left multiply topology matrices M i by random vectors s T .Using naïve vector-matrix multiplication, computing s T × M requires O(|C| 2 ) field operations, exceeding our asymptotic budget.Instead, we propose an efficient algorithm called MUL LEFT to support the above operation.Given the gate-by-gate circuit representation7 of C, our algorithm essentially gate-by-gate evaluates C "backwards" in O(|C|) operations without ever writing down the matrix M .
It is not obvious that this multiplication can be achieved in O(|C|) operations, as the matrix M can be dense due to high fan-out addition gates (see Figure 3 as an example).While the matrix M is not sparse, it is highly structured: indeed, the circuit C is itself a succinct representation of M , and our algorithm exploits this.
Our O(|C|) solution.Algorithm 1 presents MUL LEFT .To understand our algorithm, we analyze the semantics of topology matrices.Let C denote a circuit, and consider C's underlying directed acyclic graph G; i.e, the vertices in G represent gates and edges in G represent wires (see Figure 4 as an example).Now, remove each vertex corresponding to a multiplication gate in C. Additionally, add one special vertex to G called the unit vertex, which will denote a wire holding value 1 to capture offset gates.
Let M denote the topology matrix associated with C. Now, consider the first row of M (denoted as M (1)).As specified by Definition 1, this row defines the linear constraint on ℓ 1 , the left input wire of C's first multiplication gate.The first element of M (1) can be understood as the number of paths in G that start at vertex in 1 and terminate at vertex ℓ 1 .(Arithmetic circuits admit scalar gates which scale the input by a public constant; for a gate with scalar s, we say that there are Data: circuit C, vector s Result: s T × M 1 w = 0 (|C.wid|)defined over the extension field; 2 acc = 0 defined over the extension field; for each input wire in k in order do res.append(w[ink ]) ; for each k ∈ [2n × ] do res.append(−s[k]); for each multiplication gate (ℓ k , r k , o k ) in order do res.append(w[ok ]); res.append(acc); return res s paths from that gate's input to its output.We also consider offset gates which add a public constant to a wire; for a gate with offset s, we say that there are s paths from the unit vertex to the gate output.)See Figure 4 for an example.
More generally, M (i)[j] can be understood as the number of paths from auxiliary wire (see Definition 1) aux j to multiplication gate input i.There are two special cases: (1) we define the number of paths from a wire to itself to be −1; (2) the last row determines the number of paths to the circuit output wire (not multiplication).Now, consider the first column of M (denoted as M [1]).Based on the above analysis, this column can be understood as the number of paths from in 1 to ℓ 1 , . . ., ℓ n × , r 1 , . . ., r n × , out.The crucial point is this: in the graph G, the number of paths from wire a to wire b is trivially equal to the number of backwards paths (i.e., paths through the graph with all edges reversed) from b to a. Thus, if we wish to compute the inner product of some vector s with M [1], we can (1) put those values of s onto the wires ℓ 1 , . . ., ℓ n × , r 1 , . . ., r n × , out, (2) evaluate the circuit (with multiplication gates removed) backwards and (3) output the value on wire in 1 .
Note that backwards evaluation of linear gates has a clear interpretation.In particular, for an addition gate we add together its output wire values, then place the sum onto the two input wires.
Therefore, to compute the full vector-matrix product s T ×M , we simply evaluate the arithmetic gates backwards, and then output wire values in the order prescribed by aux.This is precisely the approach of Algorithm 1.Because we evaluate each linear gate exactly once, the complexity of Algorithm 1 is trivially O(|C|).

Soundness Lemmas
As discussed in Section 3, our protocols heavily rely on the fact that V can issue random vectors to compress commitments, leading to small proofs.Formally, these random challenges preserve soundness based on the following lemmas and associated corollaries, which are the kernel of our protocols and proofs.
Lemma 1.Consider a field F and let k, m ∈ Z + .Consider k arbitrary non-zero vectors x (1) , . . ., x (k) ∈ F m .The following holds: Corollary 1.If s is drawn from the extension field F q where q ∈ Z + , the upper bound of Lemma 1 is k/|F| q .
Protocol Π p,q

Single
Inputs.The prover P and the verifier V hold B circuits C1, . . ., CB over any field Fp, where each circuit has nin inputs and n× multiplication gates.Prover P also holds a witness w and an integer a ∈ [B] such that Ca(w) = 0 and |w| = nin .
Generate extended witness on Ca. 0. P evaluates Ca(w) and generates ℓ, r, o ∈ F where ℓ (resp.r, o) denotes the values on left (resp.right, output) wires of each multiplication gate, in topological order.
Initialize/Preprocessing. 1. P and V send (init) to F p,q sVOLE , which returns a uniform ∆ ∈ $ Fpq to V. 2. P and V send (extend, nin + 3n×) to F p,q sVOLE , which returns IT-MACs to the parties.3. P and V send (extend, q(B − 1)) to F p,q sVOLE , which returns q(B − 1) IT-MACs of random values over Fp.P and V then combine these IT-MACs into (B − 1) IT-MACs of random values over Fpq denoted as Commit to extended witness on Ca.P convinces V that M a × wext = 0 without leaking a. I.e., there exists a satisfied circuit.9. V samples a random vector s ∈ $ F 2n × +1 p q and sends it to P.

Formal Protocol and Analysis
We refer the reader to Section 3.1 for the intuition behind our ZK protocol for disjunctive circuit satisfiability in the single instance setting.Figure 5 formalizes our protocol; its main security property is as follows: Theorem 1 (Single Disjunction Security).Π p,q Single (Figure 5) UC-realizes F 1,B ZK (Figure 1) in the F p,q sVOLE -hybrid model with soundness error n × +2B+4 p q and information-theoretic security.
We provide a detailed proof of Theorem 1 in Appendix C.3; for now, we sketch the main argument.
Proof Sketch.By constructing a simulator S, and by extracting the witness from malicious P.
For malicious verifier A, S interacts with the ideal functionality F 1,B ZK by running A as a subroutine.S implements the ideal functionality F p,q sVOLE on behalf of A. Therefore, S knows ∆, and it can use ∆ to prove any statement to A by opening commitments to whatever value it likes.S uses this capability to send to A messages identically distributed to honest P's real-world messages, which allows it to complete the ideal world execution.
For malicious prover A, the witness can be trivially extracted from messages sent to F p,q sVOLE .S runs a proof interaction with A by acting as honest V, and it sends the extracted witness to F 1,B ZK if the interaction leads to a successful proof.The only difference between the two worlds occurs when A successfully proves a false statement; this can occur when A manages to pass checks built into the protocol.In such cases, real-world V will accept the proof, whereas ideal-world V will reject, because S does not hold a valid witness.This discrepancy occurs with low probability because the protocol is sound.
Indeed, A must pass all checks, and the probability that checks erroneously pass is bounded by the (statistical) soundness of LPZKs in Steps 8, 11 ( n × +B+4 p q in total) and by the probability of the following bad event: Let w bad denote a vector that is not an extended witness for any branch.Honest V samples a vector s such that s × M i × w bad = 0 for some i ∈ [B] in Step 10.Each M i × w bad is a non-zero vector, so this only happens with (statistical) probability at most B p q (see Corollary 1).

Protocol cost. In total, Π p,q
Single consumes the following resources: • Communication.The parties transmit n in + (2q + 3)n × + q(B + 7) = O(q|C| + qB) field elements.We next explain how to adjust Π p,q Single such that the number of transmitted field elements is only O(|C| + qB), suitable for small fields.
• Rounds.The protocol runs in 5 rounds.
Appendix D provides detailed explanation of this cost accounting.
Generating random challenges.Π p,q Single Step 9 requires V send a random challenge s of size O(q|C|) field elements.There are several methods to compress s such that it does not asymptotically dominate; these are standard, see e.g.discussion in [YSWW21].These methods trade off in soundness, communication, and computation: Powers of χ.V can send 1 random field element χ ∈ F p q and define s as (1, χ, χ 2 , . . ., χ 2n × ).This variant uses O(|C| + qB) communication and O(|C|) computation.While this saves communication, it increases soundness error to 2Bn × +B+n × +4 p q , because it increases the chance (see Lemma 2) that cheating P can randomly achieve an IT-MAC encoding of 0 on some branch.
Random Oracle.V can send a λ-bit seed, and the parties can use a random oracle (RO) to generate s.This variant has O(|C|+qB) communication, but the parties use computation to expand the RO.The soundness error (with extra random oracle assumption) is now t , where t denotes an upper bound of the number of RO queries made by the adversary.We implement this variant.

Batchman: Batched Disjunctions
We refer the reader to Section 3.2 for the intuition of our ZK protocol for batched disjunctive circuit satisfiability.Figure 6 formalizes our protocol; its main security property is as follows: Theorem 2 (Batched Disjunctions Security).Π p,q Batch (Figure 6) UC-realizes F R,B ZK (Figure 1) in the F p,q sVOLE -hybrid model with soundness error Rn × +R+3B+6 p q and information-theoretic security.
We provide a proof of Theorem 2 in Appendix C.4.In short, the proof is very similar to that of Theorem 1, except that we must additionally account for (1) the soundness of QS in Step 13 and (2) an additional bad event made possible by the check on P's committed topology vectors in Step 15.
• Rounds.The protocol runs in 7 rounds.
Appendix D provides detailed explanation of this cost accounting.
Field size.Unlike our single disjunction protocol, our batched protocol improves on prior work w.r.t.communication only for large fields.This is because in Step 12, P commits to R compressed topology vectors, and these are defined over the extension field.If one wishes to work with a small field (e.g., Boolean), repeating our single disjunction protocol is more effective w.r.t.communication.
Generating random challenges.As in our single disjunction protocol, we can reduce communication needed for V's random challenge vectors s and t by applying standard methods.In particular, these challenges can be generated using a two-row Vandermonde matrix of two random field elements, or using a random oracle.We implement the RO variant.
Constraining Batch Witnesses.Batched disjunctions allow P to prove the same disjunction with respect to R witnesses.This is only interesting if we impose additional constraints on P's witnesses; otherwise, P with only one witness can trivially re-use her witness R times to satisfy the full statement.In Appendix E we discuss simple methods for extending our protocol with additional constraints that force P to prove her R witnesses are related.
Protocol Π p,q

Batch
Inputs.P and V agree on B circuits C 1 , . . ., C B over any field Fp, where each circuit has n in inputs and n × multiplication gates.P holds R witnesses w (1) , . . ., w (R) and R integers a (1) , . . ., a (R) ∈ [B] such that for all j ∈ [R], C a (j) (w (j) ) = 0 and |w (j) | = n in .
Initialize/Preprocessing. 1. P and V send (init) to F p,q sVOLE , which returns a uniform ∆ ∈ $ F p q to V. 2. P and V send (extend, R(n in + 3n × )) to F p,q sVOLE , which returns IT-MACs {[µ 3. P and V send (extend, qR(n in + 3n × + 1)) to F p,q sVOLE , which returns qR(n in + 3n × + 1) IT-MACs of random Fp values.P and V combine these IT-MACs into R(n in + 3n × + 1) IT-MACs of random F p q values, denoted {[η 4. P and V send (extend, qR(B − 1)) to F p,q sVOLE , which returns qR(B − 1) IT-MACs of random Fp values.P and V combine these IT-MACs into R(B − 1) IT-MACs of random F p q values, denoted {[τ Commit to extended witnesses on C a (1) , . . ., C a (R) .For each j ∈ [R], proceed as follows: k ∈ Fp to V, and then both compute [w k ∈ Fp to V, and then both compute [ℓ k ∈ Fp to V, and then both compute [r Check multiplication gates.P convinces V that the Rn × committed multiplication gates are well-formed.9. P and V run a VOLE-based zero-knowledge proof for (batched) multiplications LPZK({[ℓ If ZKP fails, V outputs (false) and halts.

Generate compressed topology vectors
denote the topology matrices of C 1 , . . ., C B .10. V samples a random vector s ∈ $ F 2n × +1 p q and sends it to P.

For each i ∈ [B], P and V compute cv
Commit compressed topology vectors.For each j ∈ [R]: k ∈ F p q to V, and then both compute [ cv Check satisfiability of committed compressed topology vectors.
For each j ∈ [R], Let w I.e., the committed circuits are satisfied.13.P and V run a VOLE-based zero-knowledge proof for (batched) inner-products QS({[ cv (j) ], [w If ZKP fails, V outputs (false) and halts.
Validate committed compressed topology vectors.P convinces V that cv (j) ∈ {cv 1 , . . ., cv B } for each j ∈ [R].I.e., the committed circuits are well-formed.14.V samples a random vector t ∈ $ F n in +3n × +1 p q and sends it to P.
15.For each j ∈ [R], P and V run a VOLE-based zero-knowledge proof to show Π i∈[B] { ct (j) − ct i } = 0 by using IT-MAC [ ct (j) ] and public {ct i } i∈ [B] .Note that this is a B-product circuit defined over F p q so can be performed with {[τ and LPZK.If all R ZKPs succeed, V outputs (true); otherwise, V outputs (false).
Figure 6: Batchman: ZKP protocol for batched disjunctive circuit satisfiability over any field F p in the F p,q sVOLE -hybrid model.

Implementation and Benchmarking
We implemented our ZK protocols for both Boolean circuits (field F 2 ) and for circuits of the Mersenne prime field F 2 61 −1 .
Our implementation extends the publicly available implementation of QuickSilver [YSWW21] (their code is part of the EMP Toolkit [WMK16]).We use their VOLE and LPZK implementations.
Our implementations achieve computational security parameter κ = 128 (for VOLE) and statistical security parameter λ ≥ 100 for Boolean and λ ≥ 40 for arithmetic circuits, matching QuickSilver.
Unless otherwise specified, our experiments were run on two Amazon EC2 m5.2xlarge machines8 (respectively implementing P and V).Our implementations run single threaded.Our implementation is publicly available at https://github.com/gconeice/stacking-vole-zk.
Benchmark.Unless otherwise specified, our experiments use a benchmark where each of the B branches features a matrix multiplication (implementing the naïve algorithm) where P wishes to prove that she knows two square ℓ × ℓ matrices whose product is equal to a public ℓ × ℓ matrix.Each such circuit has O(ℓ 3 ) gates.We acknowledge that this benchmark is contrived; its purpose is to evaluate performance only.

Robin: Single Disjunction Protocol
We compare Robin with the prior state-of-the-art VOLE-based ZK protocol supporting disjunctions: . The Mac ′ n ′ Cheese implementation is not publicly available, so we use the numbers available in their paper.
[BMRS21] reported execution time when handling B branches, each consisting of ≈ 1 billion AND gates.Each branch computes 45000 iterations of the SHA-2 circuit.9For these large Boolean branches, [BMRS21] uses an elegant trick based on [BBC + 19] to reduce the communication cost of each AND gate to only 1 bit (rather than paying two extension fields communication per AND gate); this trick increases round complexity by factor O(log |C|).
We ran Robin on the same branches and the same network configuration.Due to size, we ran our experiment on two Amazon EC2 m5.8xlarge machines10 .Figure 7 tabulates the results.
[BMRS21]'s implementation does not include a VOLE backend, and it only achieves 40 bit statistical security.Our implementation includes a real VOLE backend with 100 bit statistical security.Because of these differences, it is difficult to present a completely fair comparison.Despite generating real VOLE correlations, Robin still improves performance.Figure 7 shows that we pay ≈ 25 seconds per extra branch, whereas [BMRS21] uses ≈ 150 seconds.
Figure 7 also tabulates the results for branches with matrix multiplication.This additional column demonstrates that our performance does not depend on branch structure.[YSWW21] is a state-of-the-art VOLE-based ZK protocol for Boolean/arithmetic circuits.It uses O(B|C|) computation and communication with low constants, and it serves as the baseline for our approach.We compare our single disjunction protocol Robin with [YSWW21] on circuits defined over F 2 61 −1 .Asymptotically, Robin improves communication from O(B|C|) to O(B + |C|).

# Branch
We compare using branches that each have 8 million multiplication gates, and we vary B between 5 and 100. Figure 8  When network bandwidth is low (e.g., 100 Mbps), communication remains the bottleneck, and for B > 40 Robin achieves over 10× improvement.Even when network bandwidth is high (e.g., 500 Mbps), Robin improves performance by ≈ 4×, because Robin computes fewer VOLE correlations.

More evaluation
Appendix F includes further evaluation.

Batchman: Batched Disjunctions Protocol
Our batched protocol Batchman is best for circuits over large fields.Therefore, our evaluation considers circuits over F 2 61 −1 .The AntMan implementation is not publicly available, so we use numbers from the paper.To compare, we ran experiments on the same setup: two Amazon EC2 m5.8xlarge 10 machines.

Comparison with
[WYY + 22] reported the execution of a batch of 1024 circuits where each circuit has 2 21 multiplication gates.Accordingly, we tested Batchman to ensure all branches in each repetition have 2 21 total multiplication gates.([WYY + 22] circuits are defined over F 2 59 −2 28 +1 .) Figure 9 tabulates the results; higher numbers are better.
Batchman is sensitive to network bandwidth due to its O(R|C|) asymptotic scaling, but it is computation efficient.As B increases, our improvement also increases.In the extreme case where there are 512 branches and with 1 Gbps bandwidth, Batchman is 221× faster than (single thread) Of course, AntMan solves a more general problem than Batchman.However, for our special-case problem of batched disjunctions, we demonstrate significant improvement.

Comparison with QuickSilver [YSWW21] and Robin
We compare Batchman to the baseline QuickSilver [YSWW21] protocol and to repeated runs of Robin.We experiment with benchmarks satisfying R = B, and we consider branches with 1.25×10 5 multiplication gates.Figure 10 plots speedup as compared to QuickSilver.
Compared to QuickSilver, Robin only improves communication, limiting its speedup.On the other hand, Batchman improves both communication and computation, and our speedup is almost independent of network bandwidth.Our experiment shows that Batchman enjoys an extra 2 − 9× improvement as compared to Robin.

Fine-grained analysis
Figure 11 breaks down the runtime cost of Batchman.Most of the execution time is spent committing to the witness and to the compressed topology vectors.Figure 11

CPU emulation benchmark
Our final benchmark shows that Batchman is suitable to the use-case of CPU-emulation-based ZK.
We consider a proof-of-concept CPU (without RAM) with B = 50 instructions where each instruction has 125 multiplication gates.We vary R between 50K and 500K (guided by ZEE [HYDK21]) and calculate average CPU speed.While ZEE achieves a comparable Hz rate, it has a smaller branching factor B = 20, and, crucially, our CPU step is vastly more powerful in that it executes 125 multiplications per instruction, vs a single one in ZEE.
As shown in Figure 12, Batchman achieves 9× improvement as compared to QuickSilver [YSWW21].We note that this is purely a proof of concept.To implement true CPU emulation based on Batchman, one needs to carefully design the instruction set, and ZK RAM (e.g, [FKL + 21, DdSGOTV22]) needs to be incorporated.We plot factor improvement in terms of end-to-end runtime.Circuits are defined over F 2 61 −1 and each branch has 1.25 × 10 5 multiplication gates.• n in , n × , n + , n scale , n offset , which respectively denote the number of input gates, multiplication gates, addition gates, scale gates, and offset gates.
• wid ≜ [m] denotes the collection of wire identifiers; the last identifier m is identifies the output wire.
• IN ≜ {in k } k∈[n in ] denotes the input wires.
Theorem 3 (Circuit Satisfiability).A circuit C over some field F = (+, •) represented gate-by-gate (i.e., Definition 3) is satisfiable if and only if there exists a vector w ∈ F m such that (1) Recall the definition of topology matrices in Section 4. The satisfiability of a circuit can be alternatively stated using its topology matrix: Theorem 4 (Circuit Satisfiability from Topology Matrix Multiplication).Let C denote a circuit over some field F, and let M denote C's associated topology matrix.C is satisfiable if and only if there exists a vector w ∈ F n in +3n × +1 such that: Proof.For each non-zero vector x (i∈[k]) , there is |F| m−1 different s such that (x (i) ) T × s = 0. Thus, there will be at most k|F| m−1 choices of s to make the above event happen, which implies the upper bound as this event is possible, it is unlikely.Indeed, the probability of this event is precisely the protocol soundness error.We calculate a bound on soundness error.To make real-world V output (true), a malicious P must pass three checks in sequence: (i.e., the information-theoretic soundness of LPZK, see [YSWW21]).Now, suppose A uses R invalid extended witnesses w ext (1) , . . ., w ext (R) such that multiplications are well-formed (i.e., Item 1 passes).
If A commits to at least one invalid compressed topology vector, then A can pass Item 2 with probability at most R+2 p q (i.e., the information-theoretic soundness of QS, see [YSWW21]).This occurs when A uses R compressed topology vectors cv (1) , . . ., cv (R) such that there exists j ∈ [R] where ( cv (j) ) T × w ext (j) ̸ = 0. Now, suppose A commits to R valid compressed topology vectors that pass Item 2. We are left with Item 3, which checks that each of A's commited topogy vectors indeed corresponds to the topology of some branch, i.e. cv (j∈[R]) ∈ {cv 1 , . . ., cv B }.For each i, cv i ≜ s T × M i where s is uniformly sample by V and M i is the topology matrix of C i .Note that each cv (j∈[R]) will be checked individually, so we can focus on the case where A only has one faulty w ext (j∈[R]) .W.l.o.g., we assume the faulty one is w ext (1) and associated compressed topology vector A committed is cv (1) .Recall that w ext (1) has already passed Item 1 check.This implies that ∀i ∈ [B], M i × glue( w ext (1) , 1) ̸ = 0. Recall V samples a uniformly random vector s to generate the compressed topology vectors cv i∈[B] (Step 11).If there exists some i ∈ [B] such that 1) , 1) = 0, then A can trivially pass Item 3 by setting cv (1) as cv i .However, by Corollary 1, this only happens with probability at most B p q .Now consider that ∀i ∈ [B], (cv i ) T × glue( w ext (1) , 1) ̸ = 0. Since ( cv (1) ) T × glue( w ext (1) , 1) = 0 (Item 2 passed), this implies cv (1) / ∈ {cv 1 , . . ., cv B }.We bound the probability that Item 3 does not catch this.Recall that Item 3 will first further compress cv i to some single element ct i ≜ cv T i ×t for each i ∈ [B], where t is uniformly sampled by V (Step 14).Then, P and V execute VOLE based- . However, by Corollary 2, this only happens with probability at most B p q .In the case that this does not happen, A must break soundness of the last VOLE-based ZK with B − 1 multiplication gates, which is achieved by LPZK (and an extra opening on 0).The probability of this event is at most B+2 p q .Thus, the two distributions seen by the environment E differ with probability at most RC × +R+3B+6 p q , and any unbounded environment E cannot distinguish the real-world execution and ideal-world execution, except with probability at most RC × +R+3B+6 p q .Malicious V.If S receives (false) from F R,B ZK , it simply aborts.If S receives (true) from F R,B ZK , it emulates F p,q sVOLE , selects R uniformly random extended witnesses w ext (1) , . . ., w ext (R) , and acts as an honest P, except that it maliciously passes the checks in Step 9, 13, 15. S is able to pass these because it emulates F p,q sVOLE , and thus it knows ∆ as well as shares of A's IT-MACs (i.e., S knows what values A expects).The messages received by A are all uniformly distributed, and hence the distributions seen by the environment E in the two worlds are identical.

D Detailed Cost Accounting D.1 Single Disjunction Protocol Costs
Communication.We analyze the communication complexity of Π p,q Single (Figure 5) in the F p,q sVOLEhybrid model.In our analysis, we count the number of transmitted F p elements: • In Steps 4, 5, 6, 7, P commits to her extended witness by transmitting (n in + 3n × ) elements.
• In Step 8, the call to LPZK requires V to transmit a random challenge.This challenge contains q elements, and P replies to by transmitting 2q elements.
• In Step 11, P and V run a small VOLE-based proof to handle the small product circuit.This requires the following communication: q(B − 1) elements from P to commit to intermediate wire values, q elements from P to open the circuit output, q elements from V for a random LPZK challenge, and 2q elements from P for the LPZK response.
Tallying these costs, the total communication is n in +(2q +3)n × +q(B +7) = O(q|C|+qB) elements.We will soon show a simple variant that achieves O(|C| + qB) communication by sacrificing some soundness.This variant is far more friendly to circuits over small fields (e.g., Boolean).4. V sends the random challenge for the second LPZK.
5. P sends the proof of the second LPZK and opens the final output (to prove it is 0).can be efficiently achieved by the IT-MAC linear homomorphism: the parties simply subtract two supposedly-equal values, and then P proves that the result is a IT-MAC of zero.Thus, P can finish the extra proof by sending one field element per constraint.By leveraging random oracle in a standard way, many such zero checks can be compressed into one element, yielding overall O(1) overhead.See [BMRS21] for details of this RO trick.

F Additional Evaluation
Robin vs. QuickSilver [YSWW21].Recall that we test our single disjunction protocol Robin and QuickSilver on branches that each have 8 million multiplication gates.Figure 13 tabulates the results of these experiments, which were used to generate Figure 8.
Robin vs. QuickSilver [YSWW21] in the online phase.Many previous VOLE-based ZK protocols only consider the online phase.Namely, they assume that VOLE correlations are "free", or can be viewed as preprocessing.So far, our experiments consider the full end-to-end runtime, including generating VOLE correlations.We also tested Robin's online phase and compared it with QuickSilver's; Figure 15 shows the results.
Batchman vs. repeating Robin vs. QuickSilver [YSWW21].Recall that we compared Batchman, (repeated uses of) Robin, and QuickSilver on branches that each have 1.25×10 5 multiplication gates.Figure 14 tabulates the results of these experiments, which were used to generate Figure 10.
Single disjunctions.Robin (Refined Oblivious Branching for INteractive zk) is a VOLEbased ZK for disjunctive statements expressed as an arithmetic circuit over an arbitrary field F. For a disjunction with B branches, each consisting of a maximum of |C| (multiplication) gates, P and V each compute O(B|C|) field operations and communicate O(B + |C|) field elements.More precisely, P and V communicate only O(B⌈ λ log |F| ⌉ + |C|) field elements.• Batched disjunctions.Batchman extends Robin to batches of R proofs of the same disjunction.Here, P and V each compute O(RB + R|C| + B|C|) field operations and communicate O(RB + R|C|) field elements (assuming log |F| = Ω(λ) where λ is the statistical security parameter).

Algorithm 1 :
MUL LEFT takes as input (1) an arithmetic circuit C over a field F written in gate-by-gate representation (see Definition 3 in Appendix B) and (2) a vector s over some extension field of F with length 2n × + 1.It outputs s T × M where M is the topology matrix associated with C. Array indices start at 1.
10.For each i ∈ [B], P and V compute cvi:= s T × M i ∈ (F n in +3n × +1 p q ) T , then compute [vi] = cv T i × [wext ]. 11.P and V run a VOLE-based zero-knowledge proof to show Π i∈[B] vi = 0 by using IT-MAC[v].Note that this is a B-product circuit defined over Fpq , so it can be performed with {[τi]} i∈[B−1] and LPZK.If ZKP succeeds, V outputs (true); otherwise, V outputs (false).

Figure 5 :
Figure5: Robin: ZKP protocol for disjunctive circuit satisfiability over any field F p in the F p,q sVOLEhybrid model.
plots our speedup.[YSWW21] requires 73.7 MB communication per branch; Robin requires ≈ 200 MB communication for all branches.
Figure 10: The speedup of Batchman and Robin over QuickSilver [YSWW21].We plot factor improvement in terms of end-to-end runtime.Circuits are defined over F 2 61 −1 and each branch has 1.25 × 10 5 multiplication gates.

Lemma 1 .
where • denotes the element-wise product, 3. M × w = 0.Proof.Immediate from Definition 1.Consider a field F and let k, m ∈ Z + .Consider k arbitrary non-zero vectors x (1) , . . ., x (k) ∈ F m .The following holds: Number of required subfield VOLE correlations.Π p,q Single requires a total of n in +3n × +q(B + 1) = O(|C|+qB) subfield VOLE correlations, almost all of which are used in the initialization phase; 2q subfield VOLEs are required for the two LPZK instances.Computation.The computation for each party is dominated by Step 10, where they each compute s T × M i and corresponding IT-MACs for each i ∈ [B].By leveraging MUL LEFT (Cf.Section 4), the computation cost is O(B|C|) field operations.Other Steps require either O(B) or O(|C|) operations.5-round online phase.The VOLE correlations needed for LPZKs at Step 8, 11 can be parallelized with initialization.Viewing initialization as preprocessing, our protocol can be viewed as a 5-round online phase:1.P commits to her extended witness.2.V sends the random challenge for the first LPZK and s.
3. P sends the proof of the first LPZK and commits the intermediate values of the final product circuit.

Fine-grained analysisFigure 15 :
Figure15: The speedup of our single disjunction protocol Robin over QuickSilver[YSWW21] in the online phase.We measure end-to-end runtime.The circuits are defined over F 2 61 −1 , and each branch has 1.25 × 10 5 multiplication gates.

Table 1 :
Cost of recent VOLE-based ZK systems for batched disjunctions of arithmetic circuits.B denotes number of branches, |C| denotes branch size, and R denotes batch size.
without VOLE 1 thread, λ ≥ 100, with VOLE Comparison with Mac ′ n ′ Cheese [BMRS21].We tabulate end-to-end runtime in seconds.Our reported numbers for [BMRS21] are directly from their paper.Rep. SHA2 denotes a circuit computing 45000 iterations of SHA-2.Mat.Mul denotes a circuit that multiplies two 1000 × 1000 Boolean matrices.Both circuits have ≈ 1 billion AND gates.[BMRS21] uses 124 MB of communication while ours uses 628 MB.As B increases, communication remains almost constant.The network has 30 Mbps bandwidth and 100 ms latency.
AntMan [WYY + 22] AntMan [WYY + 22] presents a protocol optimized for circuits with batched SIMD circuits, but AntMan does not consider batched disjunctions.To implement batched disjunctions in AntMan, one can consider a size B|C| instruction which is executed R times.Recall that AntMan incurs O(RB|C| log R) computation and O(B|C| + R) communication.Our batched protocol improves in The speedup of our single disjunction protocol Robin over QuickSilver [YSWW21].We report end-to-end proof runtime.Circuits are over F 2 61 −1 ; each branch has 8M mult.gates.terms of computation, incurring O(RB + R|C| + B|C|) computation and O(RB + R|C|) communication.
confirms MUL LEFT 's high Comparison with AntMan [WYY + 22].We tabulate millions of multiplication gates executed per second (mgps).AntMan-t refers to AntMan with t threads (numbers from [WYY + 22]).Batchman uses only 1 thread.Batchman-(B, C) refers to our batched protocol Batchman with B branches where each branch has C multiplication gates.Both protocols execute batches where each repetition has 2 21 multiplication gates.
1.The multiplication check performed by LPZK in Step 8. 2. The inner-product check performed by QS in Step 13. 3. The compressed topology token membership check performed by a random challenge and regular VOLE-based ZK in Steps 14, 15, 16.If A's witness does not satisfy the multiplication relation, A can pass Item 1 with probability at most Rn × +2 [YSWW21].QuickSilver[YSWW21].We tabulate end-to-end runtime in seconds.Figure8plots these results.
We measure end-to-end runtime.The circuits are defined over F 2 61 −1 , and each branch has 1.25 × 10 5 multiplication gates.Figure16: Fine-grained analysis of our single disjunction protocol Robin.Measurements are in seconds.