Meteor: Improved Secure 3-Party Neural Network Inference with Reducing Online Communication Costs

Secure neural network inference has been a promising solution to private Deep-Learning-as-a-Service, which enables the service provider and user to execute neural network inference without revealing their private inputs. However, the expensive overhead of current schemes is still an obstacle when applied in real applications. In this work, we present Meteor, an online communication-efficient and fast secure 3-party computation neural network inference system aginst semi-honest adversary in honest-majority. The main contributions of Meteor are two-fold: i) We propose a new and improved 3-party secret sharing scheme stemming from the linearity of replicated secret sharing, and design efficient protocols for the basic cryptographic primitives, including linear operations, multiplication, most significant bit extraction, and multiplexer. ii) Furthermore, we build efficient and secure blocks for the widely used neural network operators such as Matrix Multipli-cation, ReLU, and Maxpool, along with exploiting several specific optimizations for better efficiency. Our total communication with the setup phase is a little larger than SecureNN (PoPETs’19) and Falcon (PoPETs’21), two state-of-the-art solutions, but the gap is not significant when the online phase must be optimized as a priority. Using Meteor, we perform extensive evaluations on various neural networks. Compared to SecureNN and Falcon, we reduce the online communication costs by up to 25 . 6 × and 1 . 5 × , and improve the running-time by at most 9 . 8 × (resp. 8 . 1 × ) and 1 . 5 × (resp. 2 . 1 × ) in LAN (resp. WAN) for the online inference


INTRODUCTION
In the Deep-Learning-as-a-Service (DLaaS) paradigm, the service provider offers a trained neural network (NN), and a user calls a well-defined API for data analysis. Aiming to alleviate the privacy concerns associated with DLaaS [1,3], existing works have introduced secure computation to enable Secure Inference. Secure inference exploits cryptographic primitives to ensure that the only information available for the user is the inference result, and nothing more is revealed to either party.
Secure inference protocols can provide high privacy protection, but the key concern is how to obtain privacy with satisfying efficiency. Note that different cryptographic tools offer their characteristics and trade-offs. In particular, fully homomorphic encryption (FHE)-based methods are efficient in communication but still limited by expensive computation burdens [29,30,34]. Garbled circuits [73] (GC)-based schemes only require a constant round of interactions but have a high communication overhead and are expensive for arithmetic operations [5,59]. Secret sharing [64]-based approaches provide efficient arithmetic operations and support non-linear functions [46,49,50,60,70,71] using much less communication, yet usually require interactions in proportion to the depth of Multiplication (MULT) gates. Among the secret sharing-based works, 2-out-of-3 replicated secret sharing-based secure 3-party computation (3PC) approaches [49,71] have achieved significant improvements and gained much attention.
However, the online communication of replicated secret sharing is still the efficiency bottleneck even in semi-honest model. The costly online communication limits the users' query throughput, especially in WAN. Therefore, improving online communication (and running-time) is challenging and a priority in real applications.
The online communication mainly stems from MULT, and we analyze the detailed costs in the following aspects: i) Costs of Resharing: Multiplying two ℓ-bit integers (2-MULT) generates 3-out-of-3 secret shared intermediate results. This requires interactive communication of ℓ bits per party for resharing 3-out-of-3 shares into 2-out-of-3 shares in 1 round for maintaining correctness and consistency; ii) Costs of 2-MULT with Faithful Truncation: When evaluating 2-MULT on two ℓ-bit fixed-point inputs, the parties need to truncate the product to avoid overflow. However, the best known protocol for 2-MULT equipped with faithful truncation needs an online communication of 4 3 ℓ bits per party in 1 round (incur 1 3 ℓ more bits than 2-MULT on integers); iii) Costs of -MULT: -MULT takes ℓ-bit integers as inputs and multiplies them to produce the product. -MULT plays an important role in extracting the most significant bit, but existing works achieve -MULT by utilizing 2-MULT in a tree-manner. This incurs an online communication of ( − 1)ℓ bits in ⌈log 2 ⌉ rounds.
In this paper, we focus on improving the online communication costs of the latter two kinds of MULT gates. Although our method requires more costs for the setup phase (i.e., communicating (2 − 1− )ℓ bits in log 2 rounds per party for -MULT), our significant improvements in the online phase are beneficial in real applications and might be of independent interest. Formally, our techniques and contributions are as follows: Our Techniques We propose Meteor, an online communicationefficient and fast secure neural network inference system. Meteor achieves its performance improvements via our improved 3PC protocols and specific optimizations for secure NN operators. Following previous works [70,71], our 3PC protocols are secure against a semihonest adversary in honest-majority. We build several primitives with a focus on online efficiency by exploiting a function-dependent but input-independent setup.
Our construction is similar to the sharing semantics of ABY2.0 [54], but exploits a different perspective from the linearity of replicated secret sharing [6,49,71]: For the MULT, we only need linear operations of replicated secret sharing to generate 2-out-of-3, instead of 3out-of-3, secret shares in the online phase. This new perspective can accelerate the 2-MULT for fixed-point inputs and -MULT for integer inputs, and bring several further optimizations for more complex primitives, such as most significant bit extraction (MSB Ext.) and multiplexer (MUX). Besides, our linearity perspective is more straightforward to be generalized to any kind of linear secret sharing. Detailed comparison is shown in § 3.1.
Contributions Formally, we have the following contributions: • Improved Secure 3-Party Computation: We propose an improved 3PC secret sharing scheme ( · -sharing) and construct a set of basic cryptographic primitives, including linear operations (Lops), MULT, MSB Ext., MUX, and etc. Our primitives are more online communication-and round-efficient than that of SecureNN [70] and Falcon [71]. The detailed theoretical improvements are shown in Table 1 Organization We present the background and preliminaries in § 2, and give a high-level overview of Meteor in § 3. We propose efficient protocols for the basic primitives in § 4 and justify their security in § 5. And in § 6, we construct the optimized secure NN operators. The experimental results are illustrated in § 7. We discuss related works in § 8 and conclude this work in § 9.

BACKGROUND & PRELIMINARIES
We introduce the background and preliminaries about neural network and 3PC replicated secret sharing in this section.

Notations
The main notations are summarized in Table 2.

Neural Network
The computational flow of a neural network is composed of multiple linear and non-linear layers. Each layer receives input and processes it to produce an output that serves as input to the next layer. Linear Layers Typical linear layers in NN inference include Fully-Connected (FC), Convolution (CONV), and Batch Normalization (BatchNorm, only being linear layer in NN inference): • FC: Given input vector x ∈ R ×1 , a FC layer generates the output y ∈ R ×1 as y = Wx + b, where W ∈ R × is the weight matrix and b ∈ R ×1 is the bias term. More generally, neural networks often take a batch of images as inputs X × | | (| | is the batchsize), thus the FC layer can be computed with matrix multiplication as Y = WX + B. • CONV: The CONV layer computes the dot product of a small weight matrix (filter) and the neighborhood of an element of the input. The process is sliding each filter with a certain stride, and the size of filter is called filter size. For a generalized exposition on CONV, please refer to [70]. • BatchNorm: A BatchNorm layer is typically applied to shift its input to amenable ranges. During the inference, the  3-out-of-3 sharing ⟨·⟩ 2-out-of-3 replicated secret sharing · our 3PC secret sharing Z 2 ℓ discrete ring modulo 2 ℓ F field modulo prime F the ideal functionality for (·) ∈ random sample BatchNorm parameters and are fixed, BatchNorm normalizes as · + .
Non-Linear Layers NN uses activation functions to model nonlinear relationships between input and output. And Pool functions sometimes are applied.
• Activation: The activation functions are applied in elementwise. One of the most popular activation functions is ReLU function: ReLU( ) = max(0, ). Other activation functions inlcude Sigmoid, Tanh, and etc [52]; • Pool: Pooling arranges inputs into several windows and aggregates elements of each window. Maxpool (resp. Avgpool) calculates the maximum (resp. average) for each window.
Multiplication Functionality F ⟨·⟩ MULT multiplies two shared values ⟨x⟩ and ⟨y⟩, existing protocol achieves this as follows: i) First, computes z = x y + x +1 y + x y +1 locally such that z is [·]-shared.

MSB Extraction
The key step of comparing ⟨x⟩ ≥ ⟨y⟩ in two's complement representation is extracting the most significant bit of ⟨z⟩ = ⟨x⟩ − ⟨y⟩. General methods either re-interpret the arithmetic sharing as boolean sharing and evaluate an addition circuit on boolean shares to compute ⟨msb(z)⟩ 2 , or employ garbled circuits to extract the most significant bit. Recently, Wagh et al. proposed an efficient MSB Ext. method based on wrap function and bit decomposition in replicated secret sharing [71]. We follow their approach in Meteor but optimize the online efficiency with our improved secret sharing scheme. And we plan to improve other MSB Ext. methods [28,47,49] with our novel secret sharing for future work. Fixed-Point Representation In secure NN inference, we need to encode floating-point numbers as integers in rings [49,50,71]. Given floating-point ∈ R, its encoding is as: x = ⌊2 · ⌋ (mod 2 ℓ ), where it is usually ℓ = 64 and = 13 as [71]. In this way, we use [0, 2 ℓ −1 ) to represent ∈ R + , and [2 ℓ −1 , 2 ℓ ) for negative values.

A HIGH-LEVEL OVERVIEW OF METEOR
We first present an overview of our · -sharing semantics in § 3.1. Then, we show the design and threat model of Meteor in § 3.2.

Overview of · -Sharing Semantics
Costs Analysis of MULT Existing ⟨·⟩-sharing based 3PC approaches need ℓ bits per party in 1 round for evaluating 2-MULT with 2 integer inputs (resharing), and require more costs for 2-MULT with fixed-point inputs and -MULT with integer inputs: i) When multiplying 2 fixed-point inputs x and y, the parties need to reveal [z + r ′ ] and compute (z + r ′ )/2 − ⟨r⟩ for faithful truncation, where Linearity of ⟨·⟩-Sharing From § 2.3, we notice the linear operations of ⟨·⟩-values lead to ⟨·⟩-shared results locally (no communication). This is true for two ⟨·⟩-shared inputs ( 1 ⟨x⟩ + 2 ⟨y⟩ + 3 ), and can be easily generalized to three or more ⟨·⟩-shared inputs.
· -Sharing Inspired by the sharing semantics of [9,10,67] and with the linearity of ⟨·⟩-sharing in mind, we propose an improved 3PC secret sharing ( · -sharing) as follows: For brevity, we use notations Similarly, · -sharing is for x ∈ F and · 2 -sharing is for x ∈ Z 2 , where we use modulo in · -sharing and replace +, − by ⊕ and · by ∧ in · 2 -sharing.
With · -sharing, we can evaluate MULT by computing the relatively expensive multiplication of secret random ⟨ ⟩s in the setup phase, such that the online phase only involves the linear operations of ⟨·⟩-sharing. Taking 2-MULT(x, y) with integer inputs as an example, the parties compute ⟨ z ⟩ = ⟨ x ⟩⟨ y ⟩ in the setup phase, and compute m z with linear operations and 1 round of revealing. · -sharing also needs ℓ bits per party, but gives the following benefits: • For 2-MULT with fixed-point inputs, we only need ℓ bits per party in 1 round for mask-and-reveal in the online phase. This is because the intermediate results are in ⟨·⟩-shared fashion. Hence, we improve the communication by 1.3×; • For -MULT with integer inputs, we only need linear operations of ⟨·⟩-sharing to generate ⟨·⟩-shared product of integers in the online phase, since all multiplications among ⟨ ⟩s can be evaluated in the setup phase. Therefore, our approach needs an online communication of ℓ bits per party in 1 round, which is independent of and first achieved in the regime of 3PC. Compared to prior methods, we improve the online communication by × and rounds by ⌈log 2 ⌉×. What's more, we propose efficient · -sharing based protocols for other primitives and NN operators in respective § 4 and § 6. Comparison to ABY2.0 [54]. Patra et al. has proposed similar sharing semantics to improve the online efficiency of 2PC [54] inspired by ASTRA [18] and [53], but our · -sharing is different in the following aspects: • Beaver-Friendly v.s. Linearity: ABY2.0 is inspired by Beaver triples [7] and reduces the communication by sharing the inputs in a Beaver-friendly format. However, · -sharing stems from the linearity of replicated secret sharing. They might be equivalent in some settings (e.g., 2PC), but our linearity perspective is more straightforward to be generalized to other linear secret sharing. • 2PC v.s. 3PC: For the setup phase, ABY2.0 exploits Oblivious Transfer (OT) [32] or HE [25] to generate correlated randomness, but we utilize the multiplication protocol of ⟨·⟩-sharing (free of OT or HE). Therefore, · -sharing is more efficient in setup when honest-majority in 3PC is available.

Design of Meteor
Our Meteor, as depicted in Figure 1, consists of three layers: • I: We first propose an improved 3-party secret sharing scheme ( · -sharing) inspired by the linearity of replicated secret sharing in 3PC. • II: Secondly, we design efficient protocols for the most common basic cryptographic primitives, i.e., Linear Operations (Lops), MULT, MSB Ext., and MUX. • III: Thirdly, we build secure blocks for the widely used NN operators, such as MatMul, ReLU, and MP, together with specific optimizations to support fast secure NN inference.
Function-dependent but Input-independent Setup Following [70,71], we also focus on the online efficiency. Meteor is cast into a function-dependent but input-independent setup phase, and an input-dependent online phase as [19,54,55]. In the setup phase, we generate the function-dependent but input-independent correlated randomness for a given function to improve the online efficiency. This setup is available and widely utilized in many applications.
Threat Model Following works [70,71], Meteor resists semihonest adversaries in honest-majority [45]. Namely, each party follows the protocol, but may individually try to learn information about other inputs: Definition 2 (Semi-Honest Security). Let Π be a three-party protocol running in real-world and F : ({0, 1} ) 3 → ({0, 1} ) 3 be the ideal randomized functionality. We say Π securely computes F in presence of a single semi-honest adversary if for every corrupted party ( ∈ {0, 1, 2}) and every input x ∈ ({0, 1} ) 3 , there exists an efficient simulator S such that: is the output of all parties, and F (x) denotes the th output of F (x).

IMPROVED SECURE 3-PARTY COMPUTATION
In this section, we present the detailed constructions of sharing and reconstruction (

Sharing and Reconstruction
Sharing Π SHARE (x) achieves F · SHARE by enabling (secret owner) to generate a · -sharing of its x. In the setup phase, all parties together sample random ⟨ x ⟩ using existing F ⟨·⟩ RAND with gets x in clear (c.f. Appendix A). In the online phase, reveals m x = x− x .
(2) Parties exchange the shares of ⟨m z ⟩ to reconstruct m z .
that reconstructs x as follows: Given x , parties invoke F ⟨·⟩ REC to reconstruct x and locally compute x = m x + x .

Linear Operations
· -sharing is linear in the sense that given x , y and public constants 1 , 2 , and 3 , parties can compute z = 1 · x + 2 · y + 3 by locally setting

Multiplication
We consider 2-input multiplication (2-MULT) and -input multiplication ( -MULT). The former is employed in secure FC/CONV, while the latter plays an important role in secure MSB Ext.

2-Input Multiplication.
We first consider the multiplication of two integers. Given the · -shares of integers x and y, functionality F · 2-MULT generates z with z = xy. For z, we will need: In the setup phase, parties compute the input-independent ⟨ xy ⟩ = ⟨ x ⟩⟨ y ⟩. And in the online phase, parties compute ⟨m z ⟩ locally and collaboratively reveal it. So the challenge is reduced to generate ⟨ xy ⟩ given ⟨ x ⟩ and ⟨ y ⟩. We leverage F ⟨·⟩ MULT to accomplish this task as § 2.3. The protocol is in Figure 2. Π 2-MULT needs an online communication of ℓ bits per party in 1 round. Fixed-Point Multiplication Extension As analyzed in § 3, we truncate the product (i.e., xy/2 where x and y are in fixed-point) after each multiplication in secure NN inference. Existing faithful truncation method [49,71] needs 4 3 ℓ bits per party in online phase. To reduce the costs, we propose online free faithful truncation at the same online communication as 2-MULT for integers: i) In the setup phase, parties generate (⟨ z ⟩, ⟨ ′ z ⟩) with z = ′ z /2 using the optimized binary circuits [49]. ii) In the online phase, parties compute and reveal ⟨m Online Communication The correctness and precision guarantees of our method are similar to prior works [49,71], our main contributions here lie in the online communication improvements. As ⟨m ′ z ⟩ is in ⟨·⟩-sharing, our Π 2-MULT with truncation for fixedpoint inputs needs ℓ bits per party in 1 round for revealing it during the online phase, achieving 1.3× improvements.
(2) The parties run the setup phase of F · -MULT . • Online: (

4.3.2
-Input Multiplication. Functionality F · -MULT multiplies integers for any positive constant . From the fact that secret random ⟨ ⟩s are input-independent, we can multiply them in the setup phase. Therefore, we only need linear operations of ⟨·⟩-shared values to get the ⟨·⟩-shared product of integers in the online phase.
In the setup phase, parties can compute the input-independent ⟨·⟩-shares of {Π ∈ T x } T ⊆ {1,..., } exploiting F ⟨·⟩ MULT . In the online phase, the parties only need to reveal m z . The details are in Figure 5. Online Communication The online communication remains just ℓ bits per party in 1 round independent of the fan-in. In contrast, previous ⟨·⟩-sharing based methods require ( − 1)ℓ bits per party in ⌈log 2 ⌉ rounds. In the setup phase, the above method requires (2 − 1 − )ℓ bits per party in ⌈log 2 ⌉ rounds. To balance the burden in the setup and online phases, we set = 3 and 4 as [54].

Secure MSB Extraction
Given x , functionality F · SecMSB extracts msb(x) 2 securely. From x = m x + x for x , we can write where c is the carry bit of m x and x modulo 2 (ignoring their msb), which is formalized as c = (2m x + 2 x ≥ ) = (2 x ≥ − 2m x ). Let s = 2 x and b = − 2m x , our key insights are as follows: i) x is independent of inputs, we can compute msb( x ) 2 and · -shares of bits of s in the setup phase. ii) m and are public in the online phase.
With our key insights in mind, we propose protocol Π SecMSB as Figure 3. In the setup phase, we manage to generate msb( , and using F · PreMSB . We construct protocol Π PreMSB (c.f., Appendix § C) for F · PreMSB based on [49,71]. In the online phase, the challenge is computing c. Inspired by [71], we propose an optimized method as Figure 3. The key point is Input: 0 , 1 , and 2 hold · -shared x , y , and v 2 . Output: z with z = x if v = 1 and z = y otherwise.

Multiplexer
Given otherwise. This is z = ( x − y ) · v 2 + y . Let u = x − y , the challenge is computing u · v 2 . A trivial solution is converting v 2 to v by [28,61] and computing u · v with 1 + ℓ bits per party in 2 rounds. To reduce costs, we propose protocol Π MUX . Denote the value of bit v in Z 2 ℓ as v ℓ .
In the setup phase, we compute (c.f., Appendix § D) and compute ⟨ u ℓ v ⟩ using F ⟨·⟩ MULT . In online phase, parties compute and reveal m uv ℓ , set u · v ℓ = (m uv ℓ , ⟨ ⟩), and output z = u · v ℓ + y as shown in Figure 4. Online Communication Π MUX needs ℓ bits per party in 1 round.

OPTIMIZED SECURE NN OPERATORS
In § 6.1, we show the secure FC, CONV, BatchNorm. In § 6.2, we construct secure ReLU. We give private MP and ReLU-MP equivalent switching in § 6.3. The full protocols are shown in Appendix E.

Secure Matrix Multiplication
Protocol Π 2-MULT can be easily vectorized to MatMul . Given X = (m X , ⟨ X ⟩) with dimension × and Y = (m Y , ⟨ Y ⟩) with dimension × : i) In the setup phase, parties execute F ⟨·⟩ MULT to compute ⟨ XY ⟩ = ⟨ X ⟩ · ⟨ Y ⟩. ii) In the online phase, parties locally compute . We need ℓ bits (independent of ) in 1 round for the online phase. The full protocol is shown in Figure 8. The security of MatMul follows in F . Therefore, we can compute both layers together at the same costs as secure CONV.

Secure ReLU
The activation function considered in this work is the rectified linear unit (ReLU). Taking x as input, F · ReLU returns x if x ≥ 0, and 0 otherwise. To achieve F · ReLU securely, it suffices to first extract msb(x) 2 using F · SecMSB , and then execute F · MUX ( x , y , msb(x) 2 ⊕ 1) with y = 0. The details are shown in Figure 9.

Secure Maxpool
Given · -shared vector x = (x 1 , x 2 , . . . , x ) of size-, the goal of functionality F · MP is to compute the maximum value among the elements. F · MP can be implemented on top of ReLU. The key point is that the parties update max = x if and only if ReLU( max − x ) = 0 (⇔ max < x ). The full protocol is shown in Figure 10.

Online Costs of Micro Benchmarks
We present the online costs of the NN operators, including MatMul, CONV, ReLU, and MP, in Table 3

Online Costs of Single Inference
Evaluation on MNIST We perform experiments on NN with MNIST as SecureNN [70] and Falcon [71], and the results are illustrated in Table 4 The reason is that in LAN, the overall time is more restricted by the computation burden since there is enough bandwidth and a small rtt [71]. Therefore, the improvements in communication gain limited running-time benefits when the NN is computationally expensive. However, as MPC protocols are more likely to be executed in WAN, our improvements are meaningful in practical applications. The improvements mainly stem from two aspects: i) our efficient MatMul improves the online communication of linear layers by 1.3×. ii) More importantly, our online communication-and roundefficient protocols Π SecMSB and Π MUX improve the online efficiency of secure ReLU and MP functions.

Online Costs of Batch Inference
In this section, we measure the amortized online running-time for batch inference and show our improvements in scalability. Amortized Running-Time Table 6 shows the running-time of Meteor over a batch of 128 images on AlexNet and VGG16 in LAN and WAN settings. For AlexNet, the amortized time for per-image drops from 0.429s to 0.146s (2.9× improvements) in LAN and from 8.997s to 1.588s (5.7× improvements) in WAN. While for VGG16, we achieve 1.9× and 1.4× time reduction for single image inference by batch processing in respective LAN and WAN. The improvement mainly comes from batch processing amortizes the computation and latency costs for each image. Scalability Evaluation To present our scalability improvements against Falcon, we further measure the communication and runningtime of Meteor on AlexNet and VGG16 with different batchsize as Figure 11 in Appendix F. We have the following findings: i) Given the batchsize | |, we improve the online communication costs by   ≈ 1.5× in comparison to Falcon, which is consistent with the analysis for single inference in § 7.2. ii) For the online inference running-time, we achieve comparable efficiency in LAN but ≈ 1.5× improvements in WAN compared to Falcon, which is not unexpected; after all, the communication improvements have mere gains to running-time in LAN as analyzed in § 7.2. Also, the scalability improvement is primarily due to our proposed efficient protocols.

RELATED WORK
Secure NN inference using MPC has gained much attention recently.
In the area of two-party computation (2PC), CryptoNets [29] was one of the earliest works to use homomorphic encryption for secure NN inference, CryptoDL [30] developed approximate and low-degree polynomials to implement non-linear functions for efficiency improvements over CryptoNets. Mohassel et al. proposed SecurML [50] in the two-server setting with secret sharing and GC. Meanwhile, Liu et al. designed fast matrix multiplications protocols in MiniONN [46]. DeepSecure [62] uses GC to develop a privacypreserving deep learning prediction framework, and Gazelle [34] combines techniques from HE and MPC to achieve fast private inference. EzPC [17] is a ABY-based [25] framework, and its followup works [31,[56][57][58] focus on improving performance.
In order to solve the performance bottleneck of 2PC, recent works introduce a third party to assist computations. Chameleon [60] used the same technique as in [46] to complete the matrix multiplication operations but employed a semi-honest third-party to generate correlated randomness for multiplication triplets in offline. In order to solve the computational bottleneck incurred by the garbled circuits [6], both SecureNN [70] and Cryptflow [42] constructed novel protocols for non-linear functions such as ReLU and Maxpool that completely avoid the use of GC with the help of a third-party. What's more, schemes based on 3PC replicated secret sharing also provide better overall efficiency [49,71], Falcon is one of the most efficient methods and can evaluate large NN such as VGG16 and AlexNet. ASTRA [18] proposed a similar sharing scheme, and it communicated 2 elements and only required 2 parties active in the online phase. And ASTRA acted as the basis for several other maliciously-secure frameworks [15,19,38,39,54,55]. ScionFL [10] extended ABY2.0 to multi-party, and its Inner-Product protocol needs 2 elements per party in 2 rounds, which is 2× more expensive than ours. And our method is independent of ScionFL. And more works can be found [13,21,23,24,35].
There are some other works combining quantized NN with MPC [5,22,59]. Riazi et al. proposed [59], where the weights and activations are in ±1, and they used GC and Oblivious Transfer (OT) to provide constant round private inference. Quotient [5] was proposed to realize the secure computation of ternarized NN, where the weights are in {-1,0,1}. The author converts the ternarized multiplication into two binary multiplications and completes them based on OT. And other functions are all processed by GC. Therefore, prior private binary (ternarized) NN inference schemes suffer from the enormous communication costs introduced by GC, and they are even slower than secret sharing-based approaches for floatingpoint NN. Recently, some works proposed to utilize hardware, such as GPU, to accelerate the computation of MPC [20,48,51,68,72]. Specially, GForce [51] is a 2PC inference framework, it proposed stochastic rounding and truncation layers to fuse (de)quantization between non-linear/linear layers for better efficiency, and a suite of GPU-friendly protocols for common operations. CryptGPU [68] embedded cryptographic operations of discrete secret-shared values into floating-point operations to exploit existing CUDA kernels, and proposed several optimizations to softmax.

CONCLUSION & FUTURE WORK
In Meteor, we propose an improved 3PC secret sharing scheme from the linearity of replicated secret sharing and construct secure blocks for secure NN inference. Extensive evaluations also present our improvements. For future work, we are willing to improve other MSB Ext. methods [28,47,49] with our novel secret sharing and the setup communication costs for better efficiency.
(2) Parties exchange the shares of ⟨m z ⟩ to reconstruct m z .

C INPUT-INDEPENDENT RANDOMNESS GENERATION
Following Falcon [71], we generate the input-independent randomness for protocol Π SecMSB in the setup phase as Figure 6: i) We first accomplish the bit decomposition of in ⟨·⟩-sharing using F ⟨·⟩ BitDec [49], so that it is trivially to extract ⟨msb(  [71]. Protocol Π PreMSB is a little expensive but practical since it can be executed in the setup phase. The correctness is guaranteed and security follows the analysis in [49,71], and we omit it for brevity.
(2) Parties exchange the shares of ⟨m Z ⟩ to reconstruct m Z .

D.1 Bit to Arithmetic Conversion for · -Sharing
The goal of functionality F · Bit2A is to generate the arithmetic sharing of a given secret bit v 2 = (m v , ⟨ v ⟩ 2 ). Given v 2 , we have In the setup phase, parties generate the ⟨·⟩-shares of value ℓ v using F   50, 54]. The polynomials can be expressed as basic operations, and we can construct their secure protocols on top of the basic primitives. ii) Avgpool is much simpler than Maxpool. Note that the poolsize is in plaintext, parties thus can compute the sum of their respective shares and truncate the sum by the poolsize to get the approximate average [37,68]. As we do not employ these operators in Meteor, we omit their detailed protocols for brevity.

F ONLINE COSTS OF BATCH INFERENCE
The online costs of batch inference are illustrated in Figure 11.

G SECURITY PROOF
Proof. Let the semi-honest adversary A corrupt no more than one party, we now present the steps of the ideal-world adversary (simulator) S for A in the stand-alone model with security under sequential composition [16]. Our simulator S for individual protocol is constructed as follows: Security for Π SHARE : For the instances where A is the owner of the secret value x, S has to do nothing since A is not receiving any messages. S receives m x from A on behalf of honest parties. For the instances where one honest party is the owner, S sets x = 0 and follows protocol honestly. Security for Π REC : To reconstruct a value x, S is given the output x, which is the output of A. Using x and shares corresponding to honest parties, S computes the shares corresponding to A and sends this to A on behalf of honest parties. S sends the shares of honest parties to A on behalf of honest parties. Security for Π Lops : There is nothing to simulate as the protocol Π Lops is non-interactive.
Security for Π 2-MULT & Π -MULT : For the setup phase, we consider the multiplication of ⟨·⟩-sharing as an ideal functionality F ⟨·⟩ MULT which multiplies the randomness. Since we make only blackbox access to F ⟨·⟩ MULT , the simulation for the same follows from the security of the underlying primitive used to instantiate F ⟨·⟩ MULT [6]. During the online phase, S follows the step honestly using the data obtained from the corresponding setup phase. Security for Π SecMSB : For the setup phase, we invoke F PreMSB in a black-box manner as Falcon [71]. Therefore, the simulation for the same follows from the security analyzed in [71]. For the online phase, we make black-box access to F This concludes the proof. □