skip to main content
research-article
Open Access

Privacy-Preserving Decision Trees Training and Prediction

Published:19 May 2022Publication History

Skip Abstract Section

Abstract

In the era of cloud computing and machine learning, data has become a highly valuable resource. Recent history has shown that the benefits brought forth by this data driven culture come at a cost of potential data leakage. Such breaches have a devastating impact on individuals and industry, and lead the community to seek privacy preserving solutions. A promising approach is to utilize Fully Homomorphic Encryption (\( \mathsf {FHE } \)) to enable machine learning over encrypted data, thus providing resiliency against information leakage. However, computing over encrypted data incurs a high computational overhead, thus requiring the redesign of algorithms, in an “\( \mathsf {FHE } \)-friendly” manner, to maintain their practicality.

In this work we focus on the ever-popular tree based methods, and propose a new privacy-preserving solution to training and prediction for trees over data encrypted with homomorphic encryption. Our solution employs a low-degree approximation for the step-function together with a lightweight interactive protocol, to replace components of the vanilla algorithm that are costly over encrypted data. Our protocols for decision trees achieve practical usability demonstrated on standard UCI datasets encrypted with fully homomorphic encryption. In addition, the communication complexity of our protocols is independent of the tree size and dataset size in prediction and training, respectively, which significantly improves on prior works.1

Skip 1INTRODUCTION Section

1 INTRODUCTION

The ubiquity of data collected by products and services is often regarded as the key to the so called AI revolution. User and usage information is aggregated across individuals to drive smart products, personalized experience, and automation. In order to achieve these goals, stored data is accessed by multiple microservices each performing different calculations. However, these benefits come at a cost of a threat on privacy.

The public is constantly being informed of data breaches, events which impact privacy and safety of individuals and in turn have a large negative effect on the breached service providers. Whether the leakage is passwords, private pictures and messages, or financial information, it is becoming increasingly clear that drastic measures must be taken to safeguard data that is entrusted to corporations.

There are several approaches to safeguarding data and minimizing the impact of potential breaches. Most fundamentally, encryption of data at rest ensures that even if the entire database is stolen, the data is still safe. While this may have sufficed in the past, the rise of microservice-based architectures in the cloud resulted in a large number of applications having access to the cleartext (i.e., unencrypted) information, making the attack surface uncontrollably large. Ideally, we would like to allow all these applications to operate without ever being exposed to the actual information. Recent advances in the field of Homomorphic Encryption provide some hope of achieving this level of privacy.

Fully Homomorphic Encryption (\( \mathsf {FHE } \)) [8, 9, 14, 22, 23, 43] is a type of encryption that allows computation to be performed over encrypted data (“homomorphic computation”), producing an encrypted version of the result. Concretely, \( \mathsf {FHE } \) supports addition and multiplication over encrypted data, and hence allows evaluating any polynomial. The downside of \( \mathsf {FHE } \) is the heavy cost of the multiplication operation, which imposes computational limitations on the degree of the evaluated polynomial and the number of total multiplications.

Unfortunately, common computations are not “\( \mathsf {FHE } \)-friendly” as their polynomial representation is of high degree, which is a major obstacle to the widespread deployment of computing over encrypted data in practice. In particular, machine learning models require complex calculations to train and predict, and adaptations must be made in order to make them practical with \( \mathsf {FHE } \). Previous work on machine learning with FHE focused mostly on training and evaluation of logistic regression models, e.g., [12, 29], and on more complex models such as shallow neural networks e.g., [24, 39]. While these are two widely used classes of models, they are far from encompassing the entire scope of broadly used machine learning methods. In practice, tree based models remain some of the most popular methods, ranging from single decision trees, to random forests and boosting.

A decision tree is a model used for prediction, i.e., mapping a feature vector to a score or a label. The prediction is done by traversing a path from root to leaf, where the path is determined by a sequence of comparison operations “\( x_i\gt \theta \)” between a feature value \( x_i \) and a threshold \( \theta \) (continuing to right-child if satisfied, left otherwise). Training is the process of producing a decision tree from a dataset of labeled examples, with the goal of yielding accurate predictions on new unlabeled data instances.

Any solution for decision trees over encrypted data would need to address how to perform the comparison operations over such data. For prediction over encrypted data, Bost et al. [6] instantiated the comparison component via an interactive protocol, yielding communication complexity proportional to the tree size; likewise, subsequent work [4, 10, 17, 27, 31, 38, 47, 48, 49, 52] have communication complexity proportional to the tree size or depth, imposing a significant burden on the bandwidth. In the context of training, existing protocols [18, 19, 21, 36, 37, 45, 50, 51, 53] consider a multi-party setting where each party holds a cleartext subset of the dataset to be trained on; in contrast, in our settings the data is encrypted and no entity in the system holds it in the clear. This leaves the question of training decision trees over encrypted data and of prediction over encrypted data with lightweight communication as an open problem.

Elaborating on the above, this work is motivated by the enterprise setting, with a primary goal of providing a privacy-preserving solution compatible with the existing enterprise architecture. In this architecture, there are multiple data sources, collecting and uploading data to a centralized storage, called data-lake, gradually and over time. The data is stored encrypted in the data-lake, and used by multiple microservices (referred to as server) that perform computations on cleartext data decrypted with a key provided by the enterprise key-management service (KMS). The KMS is an entity holding enterprise secrets and keys and providing crypto-services to authorized entities, and thus must be safeguarded. As part of its safeguarding, the KMS is restricted to a lightweight and predefined functionality, in particular, it is prohibited from executing heavy or general purpose code. Our goal is to completely eliminate the microservices access to cleartext data, and replace it with computation over encrypted data producing an encrypted outcome (that may either be decrypted by the KMS or used in encrypted form for subsequent computations). The KMS may be employed for computation on cleartext data, provided it adheres to the aforementioned restrictions on the KMS, in particular, in must be lightweight.

1.1 Our Contribution

In this work we present the first protocols for privacy-preserving decision tree based training and prediction that attain all the following desirable properties (see Figures 811 and Table 1 in Section 5):

Table 1.
dataset name# training examples# features# labelsServer’s training time (minutes)
iris1004347
wine119133148
cancer381302278
digits1,20364101785
cover10,0005472276

Table 1. Server Run-time for Training Depth 4 Decision Trees on Encrypted UCI Datasets (128-bit security level)

(1)

Prediction: a non-interactive protocol on encrypted data.

(2)

Training: a \( d \)-round protocol between a server computing on encrypted data and the KMS, with communication complexity independent of the dataset size, where \( d \) is the constructed tree depth.

(3)

Security: provable privacy guarantees against an adversary who follows the protocol specification but may try to learn more information (semi-honest).

(4)

Practical usability: high accuracy comparable to the classical vanilla decision tree, fast prediction (less than a millisecond, amortized time), and practical training (hours) demonstrated on standard UCI datasets encrypted with \( \mathsf {FHE } \).

Our technique for comparison over encrypted data.. We devise a low degree polynomial approximation for step functions by using the least squares method, and utilize our approximation for fast and accurate prediction and training over encrypted data. To achieve better accuracy in our algorithms and protocols, the approximation uses a weighting function that is zero in a window around the step and constant elsewhere. See Section 3.1.

Further applications.. Our training and prediction protocols can be employed in additional settings:

  • Cross-entity: Our prediction protocol can trivially be used in settings where one company holds an unlabeled example, with the goal of learning the prediction result, and the other company holds a decision tree.

  • Secure outsourcing: Both our protocols can be employed in settings where the client is the owner of the example and tree in prediction (respectively, the dataset in training), and the server performs all computation (besides the lightweight KMS tasks performed by the client), resulting in protocols with lightweight client.

Terminology.. Henceforth we use the more neutral “client” terminology rather than “KMS”, in order to capture the aforementioned wider scope of applications.

1.2 Prior Work on Privacy-Preserving Decision Trees

For prediction, prior works presented interactive protocols with communication complexity proportional to the tree size [4, 6, 10, 17, 27, 31, 47, 52] or depth [49] (cf. non-interactive protocol in our work). An exception is the prior work by Lu et al. [38] and a concurrent work by Tueno et al. [48], both presenting non-interactive protocols. The protocol of [38] and the arithmetic instantiation of [48] are both with communication and client complexity that grows linearly with the tree size or depth (cf. independent of the tree size and depth in our work). Moreover, these protocols [38, 48] support only low-precision data and require a specialized data encoding and finite-field homomorphic computations 2 (cf. high-precision real-valued data in standard representation and supporting homomorphic computations over the reals in our work). We note that their specialized data encoding is incompatible with the enterprise settings considered in our work, where data is maintained in standard representation and stored in a centralized data lake that is accessed by a variety of microservices. The paper of [48] propose another instantiation of their main prediction protocol that uses binary representation of the input and performs bit-wise comparison at each node. The multiplicative depth required for this instantiation grows linearly in the bit representation length of feature values (cf., independent of the precision in our protocol). Their average running time per node, when evaluating depth 3 trees on a single sample, using a 16-threads machine, is 0.188 and 8.122 seconds when instantiated with TFHE and HElib, respectively (cf. 0.074 seconds in our single-thread implementation). Their amortized running time, when employing batching in their HElib implementation, is 10 milliseconds per node and sample on a 16-threads machine (cf. 0.018 milliseconds in our single-thread implementation). and HElib but not by TFHE, we can evaluate a random forest consisting of thousands of trees with no run-time increase; in contrast to more than \( \times 1000 \) slowdown in their TFHE implementation.

For training, the prior works [18, 19, 21, 36, 37, 45, 50, 51, 53] considered multi-party computation settings, where multiple parties communicate to train a model on the union of their private individual datasets with the goal of preventing leakage on their private dataset. In particular, every example in the training dataset is visible in cleartext to at least one participant. Moreover, their communication complexity is proportional to the dataset size. In contrast, our work addresses the enterprise settings where all data is encrypted and there is no data owner who sees cleartext data examples; furthermore, the communication complexity of our protocol is independent of the dataset size.

The technique of employing a low degree approximation to speedup computing on encrypted data was previously used for several functions, such as \( ReLU \), \( Sigmoid \) and \( Tanh \), see [5, 7, 13, 26, 28, 30, 32]. Approximating the step-function (equivalently, greater-than comparison) is done in [11, 38, 48] using Fermat’s Little Theorem, which is not applicable for computations over the reals as considered in our work. Concurrently to our work, Cheon et al. [15, 16] presented a low degree step-function approximation over the reals attaining optimal asymptotic complexity for \( \ell _\infty \)-approximation at a bounded distance from the threshold. The amortized run-time reported in [15, 16] for the homomorphic evaluation of their low-degree step-function \( \ell _\infty \)-approximation is at least 440 and 470 microseconds, respectively; this was further improved by \( 45\% \) in a recent follow-up work [34]. In comparison, the amortized run-time for the homomorphic evaluation of the low-degree step-function \( \ell _2 \)-approximation used in the empirical evaluation of our prediction and training algorithms is 30 and 52 microseconds, respectively (see Section 5.2, Table 2). Furthermore, our decision tree algorithms attain good accuracy both when using \( \ell _\infty \) and \( \ell _2 \) approximations for the step-function (see Section 5.1, Figures 9, 10).

Table 2.
cyclotomic polynomial degreesoft-step degree
81632
81960.125 s (30.5 \( \mu \)s)0.142 s (35.0 \( \mu \)s)0.200 s (49.8 \( \mu \)s)
163840.363 s (44.3 \( \mu \)s)0.430 s (52.5 \( \mu \)s)0.594 s (72.5 \( \mu \)s)

Table 2. Run-time in Seconds (Amortized Run-time in Microseconds) for Evaluating Soft-step Approximation over Encrypted Data (128-bit Security Level)

Skip 2PRELIMINARIES Section

2 PRELIMINARIES

In this section we specify standard terminology and notations used throughout this paper, as well as standard definitions for uniform convergence, decision trees, CPA-security, fully homomorphic encryption and privacy-preserving protocols.

2.1 Terminology and Notations

We use the following standard notations and terminology. For \( n \in \mathbb {N} \), let \( [n] \) denote the set \( \lbrace 1,\ldots ,n\rbrace \). A \( L \)-dimensional binary vector \( y=(y_1,\dots ,y_L) \) is called the 1-hot encoding of \( \ell \in [L] \), if the \( \ell \)’th entry is the only non-zero entry in \( y \). The \( \ell \)th component of a vector \( v \) is denoted by both \( v_\ell \) and \( v[\ell ] \).

A function \( \mu :\mathbb {N}\rightarrow \mathbb {R}^+ \) is negligible in \( n \) if for every positive polynomial \( p(\cdot) \) and all sufficiently large \( n \) it holds that \( \mu (n)\lt 1/p(n) \). We use \( {\mathsf {neg}}(\cdot) \) to denote a negligible function if we do not need to specify its name. Unless otherwise indicated, “polynomial” and “negligible” are measured with respect to a system parameter \( \lambda \) called the security parameter. We use the shorthand notation \( \text{PPT} \) for probabilistic polynomial time in \( \lambda \).

A probability ensemble \( X = \lbrace X(a, n)\rbrace _{a\in \lbrace 0,1\rbrace ^*,n\in \mathbb {N}} \) is an infinite sequence of random variables indexed by \( a\in \lbrace 0,1\rbrace ^* \) and \( n\in \mathbb {N} \). Two probability ensembles \( X = \lbrace X(a, n)\rbrace _{a\in \lbrace 0,1\rbrace ^*,n\in \mathbb {N}} \) and \( Y = \lbrace Y(a, n)\rbrace _{a\in \lbrace 0,1\rbrace ^*,n\in \mathbb {N}} \) are said to be computationally indistinguishable, denoted by \( X\approx _c Y \), if for every non-uniform polynomial-time algorithm \( \mathcal {D} \) there exists a negligible function \( {\mathsf {neg}} \) such that for every \( a\in \lbrace 0,1\rbrace ^* \) and every \( n\in \mathbb {N} \), \( \begin{equation*} |\Pr [\mathcal {D}(X(a, n))=1]-\Pr [\mathcal {D}(Y(a, n))=1]| \le {\mathsf {neg}}(n). \end{equation*} \)

2.2 Uniform Convergence

We use a standard notion for convergence of functions, as stated next.

Definition 2.1

(Uniform Convergence).

Let \( E \) be a set and let \( (f_{n})_{n\in \mathbb {N}} \) be a sequence of real-valued functions on \( E \). We say that \( (f_{n})_{n\in \mathbb {N}} \) uniformly converges on \( E \) to a function \( f \) if for every \( \epsilon \gt 0 \) there exists a \( n_0\in \mathbb {N} \) such that for all \( n\ge n_0 \) and \( x\in E \) it holds that \( |f_{n}(x)-f(x)|\lt \epsilon . \)

2.3 Decision Trees

A decision tree \( \mathsf {T} \) is a binary tree where each internal node corresponds to a partitioning of the input space along one dimension, and each leaf is associated with a label from \( \lbrace 1,\dots , L\rbrace \). The decision tree induces a mapping \( t:\mathbb {R}^k \rightarrow \lbrace 1,\dots , L\rbrace \) as follows. A tree \( \mathsf {T} \) is evaluated on an input sample \( x\in \mathbb {R}^k \) by traversing a path in the tree, from root to leaf, using the partitioning rule at each node to decide how to continue; when a leaf is reached, the label associated with it is returned.

The structure of a decision tree is typically learned in order to fit to a given dataset \( (\mathcal {X},\mathcal {Y}) \) of \( n \) labeled examples for which we ideally want to have: \( \forall x\in \mathcal {X}: t(x) = y_x \), i.e., every \( x \) is mapped to its corresponding label \( y_x \). The task of finding the optimal tree, that is the tree of a given depth for which the maximal number of the aforementioned equalities hold, is known to be NP-complete [33]. Heuristics used in practice to obtain decision trees given a dataset rely on optimizing the local quality of each partitioning (i.e., each node), by selecting the dimension and threshold value that divide the data into partitions that are each “as pure as possible”. The motivation behind this local criterion is that if all data points that arrive to the same leaf have the same label, then by assigning this label to the leaf we are able to categorize this region perfectly. Several measures of purity are commonly used, and we describe the Gini impurity measure in greater detail in Section 3, Figure 3.

2.4 CPA-Secure Public Key Encryption

A public key encryption scheme has the following syntax and correctness requirement.

Definition 2.2

(Public-Key Encryption (PKE)).

A public-key encryption (PKE) scheme with message space \( \mathcal {M} \) is a triple \( (\mathsf {Gen},\mathsf {Enc},\mathsf {Dec}) \) of PPT algorithms satisfying the following conditions:

  • \( \mathsf {Gen} \) (key generation) takes as input the security parameter \( 1^\lambda \), and outputs a pair \( (pk,sk) \) consisting of a public key \( pk \) and a secret key \( sk \); denoted: \( (pk,sk)\leftarrow \mathsf {Gen}(1^\lambda) \).

  • \( \mathsf {Enc} \) (encryption) takes as input a public key \( pk \) and a message \( m\in \mathcal {M} \), and outputs a ciphertext \( c \); denoted: \( c \leftarrow \mathsf {Enc}_{pk}(m) \).

  • \( \mathsf {Dec} \) (decryption) takes as input a secret key \( sk \) and a ciphertext \( c \), and outputs a decrypted message \( m^{\prime } \); denoted: \( m^{\prime }\leftarrow \mathsf {Dec}_{sk}(c) \).

Correctness.. The scheme is correct if for every \( (pk,sk) \) in the range of \( \mathsf {Gen}(1^\lambda) \) and every message \( m\in \mathcal {M} \), \( \begin{equation*} \Pr [ \mathsf {Dec}_{sk}(\mathsf {Enc}_{pk}(m))=m ] = 1. \end{equation*} \) where the probability is taken over the random coins of the encryption algorithm.

A PKE \( \mathcal {E}= (\mathsf {Gen}, \mathsf {Enc}, \mathsf {Dec}) \) is CPA-secure if no PPT adversary \( \mathcal {A} \) can distinguish between the encryption of two equal length messages \( x_0,x_1 \) of his choice. This is formally stated using the following experiment between a challenger \( \textsf {Chal} \) and the adversary \( \mathcal {A} \).

The \( \textsf {CPA} \) indistinguishability experiment \( \mathsf {EXP}^{cpa}_{\mathcal {A}, \mathcal {E}}(\lambda) \):.

(1)

\( \mathsf {Gen}(1^\lambda) \) is run by \( \textsf {Chal} \) to obtain keys \( (pk, sk) \).

(2)

\( \textsf {Chal} \) provides the adversary \( \mathcal {A} \) with \( pk \) as well as oracle access to \( \mathsf {Enc}_{pk}(\cdot) \), and \( \mathcal {A} \) sends to \( \textsf {Chal} \) two messages \( x_0,x_1\in \mathcal {M} \) s.t. \( |x_0| = |x_1| \).

(3)

\( \textsf {Chal} \) chooses a random bit \( b \in \lbrace 0,1\rbrace \), computes a ciphertext \( c \leftarrow \mathsf {Enc}_{pk}(x_b) \) and sends \( c \) to \( \mathcal {A} \). We call \( c \) the challenge ciphertext. \( \mathcal {A} \) continues to have oracle access to \( \mathsf {Enc}_{pk}(\cdot) \).

(4)

\( \mathcal {A} \) outputs a bit \( b^{\prime } \).

(5)

The output of the experiment is defined to be 1 if \( b^{\prime } = b \) (0 otherwise).

Definition 2.3

(\( \textsf {CPA} \)-security)

A public key encryption scheme \( \mathcal {E}= (\mathsf {Gen}, \mathsf {Enc}, \mathsf {Dec}) \) has indistinguishable encryptions under chosen-plaintext attacks (or is \( \textsf {CPA} \)-secure) if for all PPT adversaries \( \mathcal {A} \) there exists a negligible function \( {\mathsf {neg}} \) such that: \( \begin{equation*} \Pr \left[ \mathsf {EXP}^{cpa}_{\mathcal {A}, \mathcal {E}}(\lambda) = 1 \right] \le \frac{1}{2} + {\mathsf {neg}}(\lambda) \end{equation*} \) where the probability is taken over the random coins of \( \mathcal {A} \) and \( \textsf {Chal} \).

2.5 Fully Homomorphic Encryption

A fully homomorphic public-key encryption scheme (\( \mathsf {FHE } \)) is a public-key encryption scheme equipped with an additional PPT algorithm called \( \mathsf {Eval} \) that supports “homomorphic evaluations” on ciphertexts. For example, given two ciphertexts \( c_1 =\mathsf {Enc}(m_1) \) and \( c_2 =\mathsf {Enc}(m_2) \) encrypting messages \( m_1 \) and \( m_2, \) respectively with \( \mathsf {FHE } \), it is possible to produce new ciphertexts \( c_3 \) and \( c_4 \) that decrypt to \( \mathsf {Dec}(c_3)= m_1+m_2 \) and \( \mathsf {Dec}(c_4)= m_1\times m_2, \) respectively. The correctness requirement is extended to hold with respect to any sequence of homomorphic evaluations performed on ciphertexts encrypted under \( pk \) using \( \mathsf {Eval}_{pk}(\cdot) \). A fully homomorphic encryption scheme must satisfy an additional property called compactness requiring that the size of the ciphertext does not grow with the complexity of the sequence of homomorphic operations. The formal definition follows.

Definition 2.4

(Public-key FHE Scheme [9]).

A homomorphic (public-key) encryption scheme \( \mathcal {E}= (\mathsf {Gen}, \mathsf {Enc},\mathsf {Dec}, \mathsf {Eval}) \) with message space \( \mathcal {M} \) is a quadruple of PPT algorithms as follows:

  • \( (\mathsf {Gen}, \mathsf {Enc},\mathsf {Dec}) \) is a PKE, as specified in Definition 2.2.

  • \( \mathsf {Eval} \) (homomorphic evaluation) takes as input the public key \( pk \), a circuit \( C:\mathcal {M}^{\ell } \rightarrow \mathcal {M} \), and ciphertexts \( c_1,\dots , c_{\ell } \), and outputs a ciphertext \( \widehat{c} \); denoted: \( \widehat{c} \leftarrow \mathsf {Eval}_{pk}(C,c_1, \dots , c_{\ell }) \).

Security.. A homomorphic encryption scheme is said to be secure if it is a \( \textsf {CPA} \)-secure PKE.

\( \mathcal {C} \)-homomorphism.. A homomorphic encryption scheme is \( \mathcal {C} \)-homomorphic for a circuit family \( \mathcal {C} \) if for all \( C\in \mathcal {C} \) and for any set of inputs \( x_1, \dots , x_{\ell } \) to \( C \), letting \( (pk,sk)\leftarrow \mathsf {Gen}(1^\lambda) \) and \( c_i \leftarrow \mathsf {Enc}(pk, x_i) \) it holds that: \( \begin{equation*} \Pr [\mathsf {Dec}_{sk}(\mathsf {Eval}_{pk}(C,c_1,\ldots ,c_\ell))\ne C(x_1,\ldots ,x_\ell)]\le {\mathsf {neg}}(\lambda) \end{equation*} \) where the probability is taken over all the randomness in the experiment.

Compactness.. A homomorphic encryption scheme is compact if there exists polynomial \( p(\cdot) \) such that the decryption algorithm can be expressed as a circuit of size \( p(\lambda) \).

Fully homomorphic.. A homomorphic encryption scheme fully homomorphic, if it is both compact and \( \mathcal {C} \)-homomorphic for the class \( \mathcal {C} \) of all efficiently computable circuits.

2.6 Privacy-Preserving Two-Party Protocols

Next we bring the security definition specifying when a client-server protocol is privacy-preserving against a semi-honest server. A semi-honest server follows the protocol’s specifications, albeit it may try to learn additional information.

Security definition.. Informally, our protocols are privacy-preserving if the server cannot learn any additional information from participating in the protocol, beyond its output. In our protocols the server has no input or output, so the server should learn nothing from participating in the protocol. The requirement that the server learns nothing can be captured by requiring that the view of the server (i.e., its internal state and received messages during the protocol’s execution) in executions on varying (same length) inputs is computationally indistinguishable. This is formalized in [25] Definition 2.6.2 Part 2 for the case of malicious adversaries. We bring their definition, adapted to semi-honest adversaries.

Formally, our protocols involve two-parties, client and server, denoted by \( \mathsf {Clnt} \) and \( \mathsf {Srv}, \) respectively, where the client has input \( x \) and the server has no input (denoted as having input \( \bot \)), and both have the security parameter \( \lambda \). The client and server interact in an interactive protocol denoted by \( \pi =\langle \mathsf {Clnt},\mathsf {Srv}\rangle \). An execution of this protocol on client’s input \( x \), no server’s input, and security paramter \( \lambda \) is denoted by \( \langle \mathsf {Clnt}(x),\mathsf {Srv}\rangle \). The server’s view during the execution, capturing what the server has learned, is the random variable denoted by \( \textsf {view}^\pi _{\mathsf {Srv}}(x,\bot ,\lambda) \) and defined by \( \begin{equation*} \textsf {view}^\pi _{\mathsf {Srv}}(x,\bot ,\lambda) = (r, m_1,\dots ,m_t) \end{equation*} \) where \( r \) is the random coins of \( \mathsf {Srv} \), and \( m_1,\dots ,m_t \) are the messages \( \mathsf {Srv} \) received during the protocol’s execution. The client’s output in this execution is denoted by \( \textsf {out}^\pi _{\mathsf {Clnt}}(x,\bot ,\lambda) \). The protocol is privacy-preserving if the views of the server on (same length) inputs are computationally indistinguishable:

Definition 2.5

(Privacy-preserving Protocol).

An interactive protocol \( \langle \mathsf {Clnt},\mathsf {Srv}\rangle \) is privacy-preserving for a function \( F:\mathsf {A}\rightarrow \mathsf {B} \) if \( \mathsf {Srv} \) and \( \mathsf {Clnt} \) are PPT machines and the following holds:

  • There exists a negligible function \( {\mathsf {neg}}(\cdot) \) such that for all \( \lambda \in \mathbb {N} \) and all \( x\in \mathsf {A} \), \( \begin{equation*} \Pr \left[\textsf {out}^\pi _\mathsf {Clnt}(x,\bot ,\lambda) = F(x)\right] = 1-{\mathsf {neg}}(\lambda) . \end{equation*} \)

  • For every PPT distinguisher \( \mathcal {D} \) that chooses \( x_0,x_1\in \mathsf {A} \) such that \( |x_0|=|x_1| \) there exists a negligible function \( {\mathsf {neg}}(\cdot) \) such that for all \( \lambda \in \mathbb {N} \), it holds that: \( \begin{equation*} \left| \Pr \left[\mathcal {D}(\textsf {view}^\pi _\mathsf {Srv}(x_0,\bot ,\lambda)) = 1 \right] - \Pr \left[\mathcal {D}(\textsf {view}^\pi _\mathsf {Srv}(x_1,\bot ,\lambda)) = 1\right] \right| \le {\mathsf {neg}}(\lambda) \end{equation*} \)

where the probability is taken over the random coins of \( \mathsf {Clnt} \) and \( \mathsf {Srv} \).

Skip 3DECISION TREES WITH APPROXIMATE STEP FUNCTION Section

3 DECISION TREES WITH APPROXIMATE STEP FUNCTION

In this section we present our algorithms for training and prediction of decision trees. The algorithms are tailored to being evaluated over encrypted data, in the sense of avoiding complexity bottlenecks of homomorphic evaluation.

The key component in our algorithms is a low degree polynomial approximation for the step function “\( x \lt \theta \)” (aka, soft-step function); see Section 3.1. The obtained low degree approximation is used to replace the step function at each tree node in our new prediction and training algorithms, presented in Sections 3.2 and 3.3, respectively.

3.1 Low Degree Approximation of a Step Function

We construct a low-degree polynomial approximating the step function. Specifically, we consider the step function \( I_0:\mathbb {R}\rightarrow \lbrace 0,1\rbrace \) with threshold zero, defined by: \( I_0(x)=1 \) if \( x \ge 0 \) and \( I_0(x)= 0 \) otherwise. We aim to replace the step function with a soft-step function, i.e., a polynomial approximation.

There are several convenient methods for replacing piece-wise continuous functions with limited-degree polynomial approximation. One approach is to consider the appropriate space of functions as a metric space, and then to find a polynomial of the desired degree that minimizes the deviation from the target function in this metric. Natural choices of metrics are the uniform error, integral square error, and integral absolute error. We opt for the mean square integral solution, that is, the soft-step function would be the solution to the following optimization problem: (1) \( \begin{equation} \phi = \min _{p \in P_n} \int _{-2}^{2} \left(I_0(x) - p(x)\right)^2 \,dx \end{equation} \) where \( P_n \) is the set of polynomial functions of degree at most \( n \) over the reals. Setting the interval of the approximation to be \( [-2, 2] \) is sufficient once we have pre-processed all data to be in the range \( [-1, 1] \). A soft-step at \( \theta \in [-1, 1] \) is of the form \( \phi (x - \theta) \), and thus \( x - \theta \in [-2, 2] \).

However, in many cases the sensitivity to error in the approximation is not uniform over the domain. Errors at an interval around the threshold may harm the overall result of the algorithm less, compared to errors away from the threshold value. Adding an importance-weighting of the approximation interval leads to the following optimization problem: (2) \( \begin{equation} \phi = \min _{p \in P_n} \int _{-2}^{2} \left(I_0(x) - p(x)\right)^2 \, w(x) \,dx \end{equation} \) with a weighting function \( w(x) \ge 0 \ \forall x \in [-2,2] \) and \( \int _{-2}^{2} w(x) \,dx = 1 \). We note that the unique solution to this problem is obtained by the projection of \( I_0 \) onto \( P_n \), in the \( w \)-weighted norm (see Chapter II in [44]), or alternatively by applying polynomial regression (minimizing MSE) on a discrete set of points sampled proportionally to \( w \).

Experiments with polynomials which are solutions to Equation 2 with various weight functions and degrees show that by neglecting an interval around the threshold we are able to obtain tighter approximations to the linear phases of the step function. More specifically, by using weighting functions that are zero in a window around the step and constant otherwise, a trade-off is obtained between the slope of the transition and tightness of approximation at the edges of the interval (Figure 1). For larger slopes (smaller neglected windows) the approximation polynomial reaches the 0\( - \)1 plateau faster, but at the price of overshooting and oscillations around the linear parts of the step function. While this can be remedied by choosing very high degree polynomial, computational considerations (especially with \( \mathsf {FHE } \) arithmetic) lead us to favor smaller degree polynomials.

Fig. 1.

Fig. 1. Polynomial approximations of \( I_0 \) in \( [-2, 2] \) , with varying polynomial degrees (rows) and width of neglected window around the transition (columns).

Approximating in \( \ell _\infty \)-norm.. The problem of polynomial approximation in \( \ell _\infty \)-norm is of great importance in signal processing. The standard way of solving such problems is by applying the Remez Algorithm [42]. In our case, both MSE and Remez approximation gave good empirical accuracy (see Figure 9), and we prefer the former to avoid the numerical instability associated with implementations of Remez polynomials and its larger-amplitude oscillations away from the threshold (see Figure 2).

Fig. 2.

Fig. 2. MSE (left) and Remez (right) step-function approximations of degree 1 (top), 4, 8, 16 and 24 (bottom), and neglected window width 0 (blue), 0.01 (orange), 0.05 (green).

3.2 Prediction by Decision Trees with Soft-Step Function

We present our algorithm for decision trees based prediction (see Algorithm 1), which is tailored to achieve efficiency in prediction over encrypted data. Our algorithm is similar to the standard prediction algorithm [41], but where the step function in each node of the decision tree is replaced by its soft-step function counterpart.

In more detail, both our algorithm and the standard prediction algorithm are similar in the sense that each node \( v \) in a decision tree \( \mathsf {T} \) is associated with a threshold \( v.\theta \in \mathbb {R} \) and a feature \( v.feature\in [k] \), and the prediction is a label \( \ell \in \lbrace 1,\ldots ,L\rbrace \). The difference between the algorithms is specified next.

In the standard prediction algorithm, each new sample \( x \in \mathbb {R}^k \) is associated with a single path from root to leaf, where the path continues to the right-child if the step function \( I_{v.\theta } (x[v.feature]) \) evaluates to 1 and to the left-child otherwise. We denote by \( \mathsf {T}_x \) the path in \( \mathsf {T} \) associated with \( x \). The prediction for \( x \) is the label associated with the leaf in \( \mathsf {T}_x \).

In our prediction algorithm (see Algorithm 1), we replace the step function \( I_{v.\theta } (x[v.feature]) \) by the soft-step function \( \phi (x[v.feature] - v.\theta) \), where \( \phi \) is obtained via Equation 2. Note that by using a soft-step function we no longer traverse a single path in the tree. Instead, our algorithm traverses all paths in the tree and computes a weighted combination of all the leaves values, where each leaf value is the 1-hot encoding of the label associated with the leaf. The output is a length \( L \) vector assigning a likelihood score to each label, which is in turn interpreted as outputting the label with the highest score.

We make a few remarks regarding extensions of Algorithm 1. (a) The algorithm straightforwardly extends to the case where each leaf is associated with a vector of labels’ likelihoods \( leaf\_value\in \mathbb {R}^L \) (rather than a 1-hot encoding of a label). (b) The algorithm naturally extends to evaluation of base-classifiers for random forests and tree boosting, hence our method supports these common tree-based machine learning methods as well. (c) Finally, we note that the algorithm can also be viewed as a “recipe” for converting an existing tree model to a polynomial form of low degree, which may be of independent interest.

We next show that Algorithm 1 outputs the same prediction as the standard algorithm (aka, it is correct), whenever \( \phi \) is sufficiently close to \( I_0 \). Elaborating on the above, essentially we show that if \( (\phi _n)_{n\in \mathbb {N}} \) uniformly converges to the step function \( I_0 \), then for all sufficiently large \( n \), Algorithm 1, when executed with \( \phi _n \) as the soft-step function, assigns more than half the weight to the leaf associated with path that is chosen by the standard algorithm, and so the outputted label is identical to the label in the standard algorithm. This is a bit over simplified though: \( I_0 \) has a discontinuity point at 0 implying that no family of polynomials can uniformly converge to \( I_0 \) on an interval containing 0, say, \( [-2,2] \). Nonetheless, for every \( \delta \gt 0 \), there exists a family of polynomials \( (\phi _n)_{n\in \mathbb {N}} \) that uniformly converges to \( I_0 \) on the punctured interval \( [-2,2]\setminus (-\delta ,\delta) \); in particular this holds for the polynomials computed by Remez Algorithm. Employing such polynomials we guarantee correctness (i.e., the output of Algorithm 1 and the standard algorithm are the same) for all samples that are \( \delta \)-far from the discontinuity point.

Theorem 3.1 (Correctness).

Let \( \delta \gt 0 \) and let \( (\phi _n)_{n\in \mathbb {N}} \) be a sequence of functions that uniformly converges to \( I_0 \) on \( [-2,2]\setminus (-\delta ,\delta) \). For every tree \( \mathsf {T} \) of depth \( d \), there exists \( n_0=n_0(d) \) such that for all \( n\ge n_0 \), the following holds. Algorithm 1, instantiated with \( \phi _n \) and \( \mathsf {T} \), is correct on all samples \( x \) that satisfy \( \min _{v\in \mathsf {T}}\left|x[v.feature] - v.\theta \right| \gt \delta \).

Proof. Let \( \delta \gt 0 \), and let \( (\phi _n)_{n\in \mathbb {N}} \) be a sequence of functions that uniformly converges to \( I_0 \) on \( [-2,2]\setminus (-\delta ,\delta) \). Fix a tree \( \mathsf {T} \), and let \( d \) denote its depth and \( L \) denote the labels associated with \( \mathsf {T} \). For a sample \( x \) we denote by \( \mathsf {T}_x=(v_0^*,\ldots ,v^*_d) \) the path in \( \mathsf {T} \) traversed by the standard algorithm on input \( x \), and denote by \( \ell ^* \) the label outputted by it (i.e., the label associated with the leaf of \( \mathsf {T}_x \)). We show that for all sufficiently large \( n \) (that depend only on \( d \)), Algorithm 1 instantiated with \( \phi _n \) and \( \mathsf {T} \), outputs \( \ell ^* \) on all samples \( x \) with \( \min _{v\in \mathsf {T}}\left|x[v.feature] - v.\theta \right| \gt \delta \).

For this purpose, first observe that for every sample \( x \), the weight assigned by Algorithm 1 to each path \( \mathsf {P}=(v_0,\ldots ,v_d) \) from root to leaf in \( \mathsf {T} \), denoted by \( w(\mathsf {P}) \), is the product of the weights assigned by the algorithm to each internal node \( v_i \) on the path, where the weight of each node \( v_i \) is equal to \( \phi _n \left(x[v_i.feature] - v_i.\theta \right) \) if the path continues from \( v_i \) to its right child, and it is \( \phi _n \left(v_i.\theta - x[v_i.feature] \right) \) otherwise. That is, (3) \( \begin{align} w(\mathsf {P}) = \prod _{i=0}^{d-1} \phi _n \left((x[v_i.feature] - v_i.\theta) \cdot \textsf {isRC}(v_i,v_{i+1}) \right)\!. \end{align} \) where \( \textsf {isRC}(v_i,v_{i+1})=1 \) if \( v_{i+1} \) is a right-child of \( v_i \), and \( -1 \) otherwise.

Next we analyze the weight \( w(\mathsf {P}) \) assigned by Algorithm 1 to each path \( \mathsf {P} \) from root to leaf. In Lemma 3.2 we show that for all sufficiently large \( n=n(d) \), and all \( x\in \mathbb {R}^k \) s.t. \( \min _{v\in \mathsf {T}}\left|x[v.feature] - v.\theta \right| \gt \delta \), it holds that most of the weight is assigned to the path \( \mathsf {T}_x \), that is, (4) \( \begin{align} w(\mathsf {T}_x)\gt \sum _{\mathsf {P}=(v_0,\ldots ,v_d)\ne \mathsf {T}_x}w(\mathsf {P}). \end{align} \)

Finally we derive from Equation 4 the desired conclusion that Algorithm 1 is correct by showing that it assigns the highest score to the label \( \ell ^* \) outputted by the standard algorithm. For this purpose first recall that leaf values are the 1-hot encoding of the label associated with the leaf. In particular, for the leaf \( v^*_d \) in the path \( \mathsf {T}_x \), the associated label is \( \ell ^* \) and therefore: \( \begin{equation*} v^*_d.leaf\_value [\ell ^*]=1. \end{equation*} \) Next observe that the score assigned to the label \( \ell ^* \) by Algorithm 1 is the sum of weights over all paths terminating in leaves associated with this label. In particular, this score is at least as large as the weight \( w(\mathsf {T}_x) \). That is, (5) \( \begin{align} \begin{split} &\sum _{\mathsf {P}=(v_0,\ldots ,v_d)\in \mathsf {T}} w(\mathsf {P}) \cdot v_d.leaf\_value [\ell ^*]\\ & \qquad \qquad \ge w(\mathsf {T}_x) \cdot v^*_d.leaf\_value [\ell ^*]\\ & \qquad \qquad = w(\mathsf {T}_x) \end{split} \end{align} \) Moreover, for every label \( \ell \in [L] \) such that \( \ell \ne \ell ^* \) it holds that \( \begin{equation*} w(\mathsf {T}_x) \cdot v^*_d.leaf\_value [\ell ]=0. \end{equation*} \) Therefore, the score assigned to \( \ell \) by Algorithm 1 is upper bounded by the sum of weights over all paths other than \( \mathsf {T}_x \). That is, the score of \( \ell \) is upper bounded by: (6) \( \begin{align} \begin{split} & \qquad \ \ \ \sum _{\mathsf {P}=(v_0,\ldots ,v_d)\in \mathsf {T}} w(\mathsf {P}) \cdot v_d.leaf\_value [\ell ] \\ & = \sum _{\mathsf {P}=(v_0,\ldots ,v_d)\in \mathsf {T}\ s.t.\ P\ne \mathsf {T}_x} w(\mathsf {P}) \cdot v_d.leaf\_value [\ell ] \\ & \le \sum _{\mathsf {P}=(v_0,\ldots ,v_d)\in \mathsf {T}\ s.t.\ P\ne \mathsf {T}_x} w(\mathsf {P}) \\ & \lt w(\mathsf {T}_x) \end{split} \end{align} \) (where the inequality before last follows from \( v_d.leaf\_value [\ell ] \) being in \( \lbrace 0,1\rbrace \), and the last inequality follows from Equation 4). We conclude that the score assigned by Algorithm 1 to \( \ell ^* \) is strictly larger than the score Algorithm 1 assigns to any other label \( \ell \), which implies that (7) \( \begin{align} \begin{split} \ell ^* = \mathop{\text{arg min}}\limits_{\ell \in [L]}\sum _{\mathsf {P}=(v_0,\ldots ,v_d)\in \mathsf {T}} w(\mathsf {P}) \cdot v_d.leaf\_value [\ell ] \end{split} \end{align} \) as desired.

Lemma 3.2.

Let \( \delta \gt 0 \) and let \( (\phi _n)_{n\in \mathbb {N}} \) be a sequence of functions that uniformly converges to \( I_0 \) on \( [-2,2]\setminus (-\delta ,\delta) \). For every tree \( \mathsf {T} \) of depth \( d \), there exists \( n_0=n_0(d) \) such that for all \( n \ge n_0 \), the following holds. Algorithm 1, when instantiated with \( \phi _n \) and \( \mathsf {T} \), satisfies that \( \begin{equation*} w(\mathsf {T}_x)\gt \sum _{\mathsf {P}=(v_0,\ldots ,v_d)\ne \mathsf {T}_x}w(\mathsf {P}) \end{equation*} \) for all samples \( x \) s.t. \( \min _{v\in \mathsf {T}}\left|x[v.feature] - v.\theta \right| \gt \delta \).

Proof. The proof idea is as follows. If we were using in Algorithm 1 the step-function \( I_0 \), rather than \( \phi _n \), then we would get that the weight \( w(\mathsf {T}_x) \) is one whereas the weight \( w(\mathsf {P}) \) is zero for any path \( \mathsf {P}\ne \mathsf {T}_x \) from root to leaf. Now, by taking sufficiently large \( n \), we can ensure that \( \phi _n(z) \) is as close as we’d like to \( I_0(z) \) on all \( z\notin (-\delta ,\delta) \). In particular, we can enforce \( w(\mathsf {T}_x) \gt 1/2 \) and \( \sum _{\mathsf {P}\ne \mathsf {T}_x}w(\mathsf {P}) \lt 1/2 \) as long as \( \left|x[v.feature] - v.\theta \right| \gt \delta \) for all \( v\in \mathsf {T} \). In this case indeed \( w(\mathsf {T}_x) \gt \sum _{\mathsf {P}\ne \mathsf {T}_x}w(\mathsf {P}) \). We proceed with the formal details.

Fix \( \delta \gt 0 \) and let \( \mathsf {T} \) be a tree of depth \( d \). Set \( \epsilon = \epsilon (d)=\min \lbrace 1 - 2^{-1/d}, 2^{-2d} \rbrace \). The premise of uniform convergence means that there exists \( n_0=n_0(\epsilon)=n_0(d) \) such that for all \( n\gt n_0 \), the following holds: \( \begin{equation*} |\phi _n(z)-I_0(z)|\lt \epsilon \mbox{ for all }z\in [-2,2]\setminus (-\delta ,\delta). \end{equation*} \) Since \( \min _{v\in \mathsf {T}}\left|x[v.feature] - v.\theta \right| \gt \delta \), the above implies that for all \( v\in \mathsf {T} \), (8) \( \begin{eqnarray} |\phi _n \left(x[v.feature] - v.\theta \right) - I_0\left(x[v.feature] - v.\theta \right)| \lt \epsilon \end{eqnarray} \) and likewise \( |\phi _n \left(v.\theta - x[v.feature] \right) - I_0\left(v.\theta - x[v.feature] \right)| \lt \epsilon . \)

We first lower bound the weight of the path \( \mathsf {T}_x=(v_0^*,\ldots ,v^*_d) \) traversed by the standard algorithm. For all nodes \( v^*_i\in \mathsf {T}_x \), it holds by the definition of \( \mathsf {T}_x \) that \( I_0((x[v^*_i.feature] - v^*_i.\theta) \cdot \textsf {isRC}(v^*_i,v^*_{i+1})) =1 \), implying by Equation 8 that the weight assigned to \( \mathsf {T}_x \) by Algorithm 1 is at least: \( \begin{equation*} w(\mathsf {T}_x) = \prod _{i=0}^{d-1} \phi _n \left((x[v^*_i.feature] - v^*_i.\theta) \cdot \textsf {isRC}(v^*_i,v^*_{i+1}) \right) \gt (1-\epsilon)^d. \end{equation*} \) Assigning \( \epsilon \le 1 - 2^{-1/d} \) we get that: (9) \( \begin{eqnarray} w(\mathsf {T}_x) \gt 1/2 . \end{eqnarray} \)

We next upper bound the sum of weights over all paths other than \( \mathsf {T}_x \). Let \( \mathsf {P}=(v_0,\ldots ,v_d) \) be a path from root to leaf in \( \mathsf {T} \) such that \( \mathsf {P}\ne \mathsf {T}_x \). Let \( i\in \lbrace 1,\ldots ,d \rbrace \) be the first index where \( v_i\ne v^*_i \). Namely, in \( v_{i-1} \) \( \mathsf {P} \) proceeds to a different child than \( \mathsf {T}_x \) implying that: \( \begin{equation*} I_0\left(\left(x[v_{i-1}.feature] - v_{i-1}.\theta \right) \cdot \textsf {isRC}(v_{i-1},v_i) \right) = 0 . \end{equation*} \) By Equation 8, the above implies that: \( \begin{equation*} \phi _n \left(\left(x[v_{i-1}.feature] - v_{i-1}.\theta \right) \cdot \textsf {isRC}(v_{i-1},v_i) \right) \lt \epsilon . \end{equation*} \) Furthermore, for any other internal node \( v_{j}\in \mathsf {P} \) (\( j\ne i-1 \)), since \( I_0 \) is upper bounded by 1, then by Equation 8 it holds that: \( \begin{equation*} \phi _n \left((x[v_j.feature] - v_j.\theta) \cdot \textsf {isRC}(v_j,v_{j+1}) \right) \lt 1+\epsilon . \end{equation*} \) Put together, the product of all these values is upper bounded by: \( \begin{equation*} w(\mathsf {P}) \lt \epsilon \cdot (1+\epsilon)^{d-1}. \end{equation*} \) The total weight of all paths in \( \mathsf {T} \) besides \( \mathsf {T}_x \) is therefore upper bounded by: (10) \( \begin{eqnarray} \sum _{ \mathsf {P}\ne \mathsf {T}_x}w(\mathsf {P}) \lt (2^d - 1)\cdot \epsilon \cdot (1+\epsilon)^{d-1} \lt 2^{2d-1}\cdot \epsilon \le 1/2 \end{eqnarray} \) where the last two inequalities hold since \( \epsilon \lt 1 \) and \( \epsilon \le 2^{-2d} \), respectively. This concludes the proof, as desired.

Concretely, we have showed the following:

Corollary 3.3.

For every real \( \delta \gt 0 \), tree \( \mathsf {T} \) of depth \( d \), and \( \phi :[-2,2]\rightarrow \mathbb {R} \) s.t.

\( |\phi (z)-I_0(z)| \lt \min \lbrace 1 - 2^{-1/d}, 2^{-2d} \rbrace \) for all \( z\in [-2,2]\setminus (-\delta ,\delta) \), the following holds: Algorithm 1, when instantiated with \( \phi \) and \( \mathsf {T} \), is correct on all samples \( x \) s.t. \( \min _{v\in \mathsf {T}}\left|x[v.feature] - v.\theta \right| \gt \delta \).

3.3 Training Decision Trees with Soft-Step Function

The standard training procedure considers splits that partition the training dataset \( \mathcal {X} \) and builds a tree according to local objectives based on the number of examples of each label that flow to each side at a chosen, per node, split. In the training procedure, at each node, impurity scores are calculated for each potential split, then the feature and threshold are chosen in order to minimize some impurity measure, e.g., the weighted Gini impurity (see Figure 3).

Fig. 3.

Fig. 3. The weighted Gini impurity computation.

Traditionally, the training procedure associates with each node a set of indicators \( W=\lbrace w_x\rbrace _{x\in \mathcal {X}} \), so that \( w_x \) is a bit indicating if example \( x \) is participating in the training of a sub-tree rooted at this node, and \( W \) is updated for the children nodes as follows: for the chosen feature \( i^* \), and threshold \( \theta ^* \), the right sub-tree (respectively, left sub-tree) (11) \( \begin{equation} $\forall x \in \mathcal {X}: w^{\mathsf {right}}_x =w_x \cdot I_{\theta ^*}(x[i^*])\;\;\;\; {\left(\text{resp}.w^{\mathsf {right}}_x = w_x \cdot (1-I_{\theta ^*}(x[i^*]))\right)}$ \end{equation} \)

In our approach, to avoid the comparison operation that is expensive over encrypted data, we replace the step function \( I_0 \) by the low-degree polynomial approximation \( \phi \) obtained via Equation 2.

Notice that our approximated version of Equation 11 has real valued weights instead of Boolean indicators. This means that every example reaches every node, and is evaluated at all nodes. This results in a soft partition of the data, where the two children nodes get each data point, weighted differently, rather than hard split of the data. In order to efficiently keep track of the weight of each data example at each node, we keep a weights set \( W \) while constructing the tree during training. All weights are initialized to 1, and recursively multiplied by the polynomial approximation at the current node before passing on to the children nodes. The details of the training algorithm are presented in Algorithm 2.

Looking ahead, we carefully divide the operations in Algorithm 2 to those where homomorphic evaluation on encrypted data is “efficient” vs. “costly”. Concretely, addition, multiplication and evaluating the low degree polynomial \( \phi \) are efficient, whereas computing the Gini Impurity (Figure 3) involves the costly division and argmin operations.

Skip 4PREDICTION AND TRAINING ON ENCRYPTED DATA Section

4 PREDICTION AND TRAINING ON ENCRYPTED DATA

In this section we present our secure protocols for prediction and training of tree based models where both the training dataset and the data for prediction are encrypted with \( \mathsf {FHE } \). The protocols are an adaptation of Algorithms 12 in Section 3 to interactive settings and where data is encrypted throughout the computation. See our protocol for prediction over encrypted data in Figures 4 and 5 in Section 4.1, and our protocol for training over an encrypted dataset in Figures 6 and 7 in Section 4.2.

Fig. 4.

Fig. 4. The prediction protocol \( \mathsf {PP}=\langle \mathsf {Clnt}_\mathsf {PP},\mathsf {Srv}_\mathsf {PP}\rangle \) for decision trees. The server homomorphically evaluates a weighted sum of the leaf values to obtain an encrypted vector of labels’ scores. The client decrypts and outputs the label with highest score.

Fig. 5.

Fig. 5. The subroutine \( \mathsf {Enc\_Predict}(\cdot ,\cdot) \) operates recursively on a node and ciphertext pair. The subroutine is an adjustment of Algorithm 1 to operate over encrypted data.

Fig. 6.

Fig. 6. The training protocol is \( \mathsf {TP}=\langle \mathsf {Clnt}_\mathsf {TP},\mathsf {Srv}_\mathsf {TP}\rangle \) , constructing a decision tree \( \mathsf {T} \) in BFS manner. The computation phase (Step 2) is an adaptation of Algorithm 2 to a protocol that operates over encrypted data (denoting encrypted values by placing them within \( [\![ \cdot ]\!] \) ). The protocol can be executed in parallel to train a Random Forest.

Fig. 7.

Fig. 7. The sub-protocol \( \mathsf {Enc\_Train} \) constructs the node \( v \) from the encrypted examples, labels and their weights at \( v \) . The outcome is encrypted values for \( v.feature \) , \( v.\theta \) , \( \mathsf {W}_{v.right} \) and \( \mathsf {W}_{v.left} \) when \( v \) is an internal node ( \( v.leaf\_value \) when \( v \) is a leaf).

4.1 Decision-Tree based Prediction on Encrypted Data

We present our privacy preserving protocol for decision tree based prediction. The protocol is between a client holding a data sample and a server holding a tree, and consists of the following steps: the client encrypts the data sample and sends the encrypted sample to the server; the server homomorphically evaluates Algorithm 1 on the encrypted sample and sends the encrypted outcome to the client; the client decrypts to obtain the prediction. The protocol is privacy preserving for Algorithm 1; see Definition 2.5 and Theorem 4.1. The client’s and communication complexity are proportional to the size of the encrypted input and output, while being independent of the tree size and depth; the server’s complexity is linear in the tree size and the number of labels and polynomial in the security parameter.

Our protocol can be executed on any tree based model such as Random Forest or Boosted Tree algorithms. Our protocol extends to the case where both the sample and the decision tree are encrypted, providing secrecy for both the data sample and the tree.

See the detail of our protocol on encrypted sample and cleartext trees in Figures 4 and 5; the extension to encrypted trees below; and the privacy and complexity analysis in Theorem 4.1.

4.1.1 Extension to Prediction on Encrypted Data and Encrypted Tree..

For certain applications it is required that the tree used for prediction remains hidden from the server. Minor modification to the protocol in Figure 4 transforms it to a protocol that keeps both the data sample and the tree private. Essentially, the tree will be transmitted encrypted to the server (not necessarily by the client), and the protocol in Figure 4 is modified to be executed on encrypted sample and encrypted tree.

In detail, the tree \( \mathsf {T} \) is encrypted as follows. For each node \( v \) in \( \mathsf {T} \), the field \( v.feature \) is first transformed into a 1-hot encoding (i.e., a binary vector of dimension \( k \) with a single non-zero entry at the index specified by the feature). Then the fields \( v.feature \) (in the 1-hot encoding), \( v.\theta \) and \( v.leaf\_value \) are encrypted with the \( \mathsf {FHE } \) using the public key \( pk \) (generated in Figure 4, Step 1a); denote the resulting ciphertexts by \( \mathsf {\tilde{v}.feature} \), \( \mathsf {\tilde{v}.\theta } \) and \( \mathsf {\tilde{v}.leaf\_value,} \) respectively. The fields \( v.right \) and \( v.left \) are not encrypted as they are not secret. The protocol is modified as follows. In Figure 5, Step 2, instead of using \( v.feature \) to directly access the desired entry in \( \mathbf {c_x} \), we use the encrypted 1-hot encoding \( \mathsf {\tilde{v}.feature} \); specifically, we replace in the homomorphically evaluated formula each \( \mathbf {c_x}[v.feature] \) by \( \sum _{i\in [k]}\mathbf {c_x}[i]\cdot \mathsf {\tilde{v}.feature}[i] \). Finally, in Figure 5, Step 2, instead of returning the cleartext value \( v.leaf\_value \) we return the encrypted value \( \mathsf {\tilde{v}.leaf\_value} \).

This extension preserves privacy of both the sample \( x \) and the tree \( \mathsf {T} \). Moreover, the complexity is similar to the original protocol: the server only computes \( k \) additional multiplications and additions per node, for \( k \) the number of features, and the overall multiplicative depth of the homomorphic evaluation grows only by 1.

Theorem 4.1 (Privacy-preserving Prediction).

\( \mathsf {PP}=\langle \mathsf {Clnt}_\mathsf {PP},\mathsf {Srv}_\mathsf {PP}\rangle \) (Figure 4) is a privacy-preserving protocol for Algorithm 1 provided that the underlying encryption scheme \( \mathcal {E} \) is CPA-secure. Moreover, the computation phase (Figure 4 Step 2) is non-interactive; the complexity of \( \mathsf {Srv}_\mathsf {PP} \) is \( O(m\cdot L)\cdot poly(\lambda) \), where \( m \) denotes the number of decision tree nodes, \( L \) the number of labels, and \( \lambda \) is the security parameter; the complexity of \( \mathsf {Clnt}_\mathsf {PP} \) is proportional to encrypting the input and decrypting the output. The multiplicative depth required by \( \mathsf {PP} \) is \( \log (deg(\phi)) + \log (d) \), where \( deg(\phi) \) is the degree of the soft-step function \( \phi \) and \( d \) denotes the depth of \( \mathsf {T} \).

Proof [of Theorem 4.1]. Let \( \mathcal {E}= (\mathsf {Gen}, \mathsf {Enc}, \mathsf {Dec},\mathsf {Eval}) \) denote the \( \mathsf {FHE } \) encryption scheme used in the protocol \( \mathsf {PP}=\langle \mathsf {Clnt}_\mathsf {PP},\mathsf {Srv}_\mathsf {PP}\rangle \), and suppose \( \mathcal {E} \) is CPA-secure. To prove that the protocol is privacy-preserving we prove it is \( \text{PPT} \), complete and private. Furthermore, we analyze its complexity.

We first analyze the complexity of \( \mathsf {PP} \) and prove it is \( \text{PPT} \). \( \mathsf {Clnt}_\mathsf {PP} \) given input \( x\in [-1,1]^k \) performs the following operations: a single execution of \( \mathsf {Gen} \), \( k \) executions of \( \mathsf {Enc} \) (one for each features in \( x \)), \( L \) executions of \( \mathsf {Dec} \) (one for each label weight in the output result \( \mathsf {p\_res} \)), and computing the maximum of the resulting \( L \) values. Since \( \mathsf {Gen},\mathsf {Enc},\mathsf {Dec} \) all have complexity \( poly(\lambda) \), we conclude that \( \mathsf {Clnt}_\mathsf {PP} \) is \( \text{PPT} \) and its complexity is proportional to encrypting the input and decrypting the output. Our \( \mathsf {Srv}_\mathsf {PP} \) performs a constant number of basic homomorphic operations (i.e., additions and multiplications) for each internal node, and \( O(L) \) for each leaf, where each basic homomorphic operation has complexity \( poly(\lambda) \). We conclude that \( \mathsf {Srv}_\mathsf {PP} \) is \( \text{PPT} \) with complexity \( O(m\cdot L)\cdot poly(\lambda) \).

The multiplicative depth required by \( \mathsf {PP} \) is the multiplicative depth required to compute Figure 5. In particular, it is equivalent to the multiplicative depth required for the following computation for each path \( \mathsf {P}=(v_0,\ldots ,v_d) \) from root to leaf in \( \mathsf {T} \), (12) \( \begin{align} v_d.leaf\_value\cdot \prod _{i=0}^{d-1} \phi \big (\mathbf {c_x}[v_i.feature] - v_i.\theta \big) \cdot \textsf {isRC}(v_i,v_{i+1}). \end{align} \) where \( \textsf {isRC}(v_i,v_{i+1})=1 \) if \( v_{i+1} \) is a right-child of \( v_i \), and \( -1 \) otherwise. Equation 12 requires \( \log (deg(\phi)) + \log (d) \) multiplicative depth, where in the lower level the terms are evaluated consuming \( \log (deg(\phi)) \) multiplicative depth and then these results are multiplied using a \( \log (d) \) depth fan-in 2 circuit.

We next prove that \( \mathsf {PP}=\langle \mathsf {Clnt}_\mathsf {PP},\mathsf {Srv}_\mathsf {PP}\rangle \) is complete. Observe that \( \mathsf {PP} \) homomorphically evaluates the same function as computed in Algorithm 1, and hence completeness follows immediately from the correctness of \( \mathcal {E} \).

Finally, we prove that \( \mathsf {PP} \) satisfies the privacy condition of Definition 2.5. Assume by contradiction that privacy does not hold for \( \mathsf {PP} \). That is, there exists a PPT distinguisher \( \mathcal {D} \) that chooses \( x_0,x_1\in \mathsf {A} \) with \( |x_0|=|x_1| \), and a polynomial \( p(\cdot) \) such that for infinitely many \( \lambda \in \mathbb {N} \): (13) \( \begin{align} \Pr \left[\mathcal {D}\left(\textsf {view}^\mathsf {PP}_{\mathsf {Srv}_\mathsf {PP}}(x_1,\bot ,\lambda)\right) = 1 \right] - \Pr \left[\mathcal {D}\left(\textsf {view}^\mathsf {PP}_{\mathsf {Srv}_\mathsf {PP}}(x_0,\bot ,\lambda)\right) = 1\right] \ge 1/p(\lambda) \end{align} \) We show that given \( \mathcal {D} \) we can construct an adversary \( \mathcal {A} \) that violate the \( \textsf {CPA} \) security of \( \mathcal {E} \). The adversary \( \mathcal {A} \) participates in \( \mathsf {EXP}^{cpa}_{\mathcal {A},\mathcal {E}} \) as follows:

(1)

Upon receiving \( pk \), execute \( \mathcal {D} \) to obtain \( x_0,x_1 \), and send them to \( \textsf {Chal} \).

(2)

Upon receiving \( \mathbf {c}_x\leftarrow \mathsf {Enc}_{pk}(x_b) \), behave exactly as \( \mathsf {Srv}_\mathsf {PP} \) behaves while executing \( \mathsf {PP} \) upon receiving \( (\mathbf {c}_x, pk) \) from \( \mathsf {Clnt}_\mathsf {PP} \).

(3)

Run the distinguisher \( \mathcal {D} \) on \( \textsf {view}^\mathsf {PP}_{\mathsf {Srv}_\mathsf {PP}} \) (\( \mathsf {Srv}_\mathsf {PP} \)’s view in \( \mathcal {A} \) during step 2) and output whatever \( \mathcal {D} \) outputs.

The adversary \( \mathcal {A} \) is \( \text{PPT} \) due to \( x_0,x_1 \) being efficiently samplable and \( \mathsf {Srv}_\mathsf {PP} \) and \( \mathcal {D} \) being PPT. We denote by \( \textsf {view}^{\mathsf {EXP}^{cpa}}_{\mathsf {Srv}_\mathsf {PP}}(x_{b^*},\bot ,\lambda) \) the view of \( \mathsf {Srv}_\mathsf {PP} \), simulated by \( \mathcal {A} \), in the execution of \( \mathsf {EXP}^{cpa}_{\mathcal {A},\mathcal {E}} \) with bit \( b^* \) being selected by the challenger. Since \( \mathcal {A} \) behaves exactly as \( \mathsf {Srv}_\mathsf {PP} \) in \( \mathsf {PP} \), it holds that for every \( b^*\in \lbrace 0,1\rbrace \), (14) \( \begin{align} \Pr \left[\mathcal {D}\left(\textsf {view}^\mathsf {PP}_{\mathsf {Srv}_\mathsf {PP}}(x_{b^*},\bot ,\lambda)\right) = 1 \right] = \Pr \left[\mathcal {D}\left(\textsf {view}^{\mathsf {EXP}^{cpa}}_{\mathsf {Srv}_\mathsf {PP}}(x_{b^*},\bot ,\lambda)\right)=1\right] \end{align} \) From Equations 13 and 14 it follows that: (15) \( \begin{align} \Pr \left[\mathcal {D}\left(\textsf {view}^{\mathsf {EXP}^{cpa}}_{\mathsf {Srv}_\mathsf {PP}}(x_1,\bot ,\lambda)\right)=1\right] -\Pr \left[\mathcal {D}\left(\textsf {view}^{\mathsf {EXP}^{cpa}}_{\mathsf {Srv}_\mathsf {PP}}(x_0,\bot ,\lambda)\right)=1\right]\ge 1/p(\lambda) \end{align} \) Therefore, we obtain that:

where the last inequality follows from Equation 15. Combining this with \( \mathcal {A} \) being \( \text{PPT} \) we derive a contradiction to \( \mathcal {E} \) being \( \textsf {CPA} \) secure. This concludes the proof.\( \Box \)

4.2 Decision-Tree Training on Encrypted Data

In this section we present our privacy-preserving protocol for training decision trees and Random Forests. Our protocol is a careful adaptation of Algorithm 2 to the client-server setting in a way that the computational burden is almost fully on the server: the computation phase (Figure 6 Step 2) attains client and communication complexity that are independent of the training dataset size, whereas only the server’s complexity grows with the training dataset size. The only dependence of the client in the dataset size is when the data is encrypted and uploaded to the sever (Figure 6 Step 1). See Figures 6 and 7 and Theorem 4.2.

The computation phase of our protocol (Figure 6 Step 2) is a homomorphic evaluation of Algorithm 2 but with client’s aid for few and lightweight computations. Concretely, we replace all operations on cleartext data with their corresponding homomorphic operations on encrypted data, except for computing the impurity score (Line 9 in Algorithm 2) that is done with the aid of the client. To compute the impurity score, the server first homomorphically aggregates encrypted data samples from the training set into \( |S|\cdot k\cdot L \) ciphertexts and sends these encrypted aggregates to the client (where \( |S| \) is the number of considered thresholds, \( k \) the number of features, and \( L \) the number of labels). Next, the client decrypts these aggregates and computes the impurity score on the resulting cleartext values (Figure 3) to obtain the chosen threshold and feature. The client then encrypts the chosen threshold and feature and sends the ciphertexts to the server. This procedure attains the desired complexity and privacy: the client’s complexity is independent of the dataset size, and the server is exposed only to encrypted values; see Theorem 4.2.

We remark that although the protocol in Figure 6 is presented with respect to the \( \mathsf {Gini} \) impurity measure, it can be instantiated with other standard impurity measures, such as entropy. Furthermore, the leaves of \( \mathsf {T} \) can be post-processed by the client to associate a single label with each leaf instead of a likelihood score vector.

Theorem 4.2 (Privacy-preserving Training).

\( \mathsf {TP}=\langle \mathsf {Clnt}_\mathsf {TP},\mathsf {Srv}_\mathsf {TP}\rangle \) (Figure 6) is a privacy-preserving protocol for Algorithm 2 provided that the underlying encryption scheme \( \mathcal {E} \) is CPA-secure. Moreover, the computation phase (Figure 6 Step 2) is a \( d \)-round protocol for depth \( d \) trees, with \( \mathsf {Clnt}_\mathsf {TP} \) and communications complexity \( O(m\cdot k\cdot |S|\cdot L)\cdot poly(\lambda) \), and \( \mathsf {Srv}_\mathsf {TP} \) complexity \( O(n\cdot m \cdot |S| \cdot k\cdot L)\cdot poly(\lambda) \), where \( n \) is the training set size, \( k,L,m,|S| \) denote the number of features, labels, decision tree nodes and considered thresholds, respectively, and \( \lambda \) is the security parameter. The multiplicative depth required by \( \mathsf {TP} \) is \( \log (deg(\phi)) + d + 1 \), where \( deg(\phi) \) is the degree of the soft-step function \( \phi \) and \( d \) denotes the depth of \( \mathsf {T} \).

Proof [of Theorem 4.2]. Let \( \mathcal {E}= (\mathsf {Gen}, \mathsf {Enc}, \mathsf {Dec},\mathsf {Eval}) \) denote the \( \mathsf {FHE } \) encryption scheme used in the protocol \( \mathsf {TP}=\langle \mathsf {Clnt}_\mathsf {TP},\mathsf {Srv}_\mathsf {TP}\rangle \), and suppose \( \mathcal {E} \) is CPA-secure. To prove that the protocol is privacy-preserving we prove it is \( \text{PPT} \), complete and private. Furthermore, we analyze its complexity.

We first analyze the complexity of \( \mathsf {TP} \) and prove that it is \( \text{PPT} \). We denote, as in the protocol, by \( k,L,m,|S| \) the number of features, labels, decision tree nodes and considered thresholds, respectively, and by \( \lambda \) the security parameter. First, we analyze \( \mathsf {Srv}_\mathsf {TP} \). \( \mathsf {Srv}_\mathsf {TP} \) performs in the protocol \( n \cdot L \) homomorphic multiplications and \( n \) homomorphic additions for each leaf (Step 2d, Figure 7) as well as for each internal node and each threshold and feature (Step 2d, Figure 7), plus another \( k \) homomorphic multiplications and additions to process the response from \( \mathsf {Clnt}_\mathsf {TP} \) (Step 2d, Figure 7). Each homomorphic operation is polynomial in the security parameter \( \lambda \). Therefore, \( \mathsf {Srv}_\mathsf {TP} \) is \( \text{PPT} \) and with overall complexity of \( O(n\cdot m \cdot |S|\cdot k\cdot L)\cdot poly(\lambda) \). Next we analyze \( \mathsf {Clnt}_\mathsf {TP} \). \( \mathsf {Clnt}_\mathsf {TP} \) performs in the computation phase (Figure 6 Step 2) \( O(k\cdot |S|\cdot L) \) operations of \( \mathsf {Dec} \) (Step 2d, Figure 7) and \( O(k) \) operations of \( \mathsf {Enc} \) (Step 2d, Figure 7) and computes \( \mathsf {Gini} \) on cleartext. The time to compute each \( \mathsf {Enc} \) and \( \mathsf {Dec} \) is polynomial in the security parameter \( \lambda \), and the time to compute \( \mathsf {Gini} \) on cleartext is \( O(k\cdot |S|\cdot L) \) (Figure 3). So the complexity of \( \mathsf {Clnt}_\mathsf {TP} \) in the computation phase (Figure 6 Step 2) is \( O(m\cdot k\cdot |S|\cdot L)\cdot poly(\lambda) \). Moreover, the entire computation of \( \mathsf {Clnt}_\mathsf {TP} \) (including generating keys, encrypting the input and decrypting the output in Figure 6, Step 1 and Step 3) is polynomial in its input and the security parameter, so \( \mathsf {Clnt}_\mathsf {TP} \) is \( \text{PPT} \). Finally, we analyze the communication complexity. At each node \( O(k\cdot |S|\cdot L) \) ciphertexts are transmitted, and hence the communication complexity is \( O(m\cdot k\cdot |S|\cdot L)\cdot poly(\lambda) \).

Next we analyze the multiplicative depth required by \( \mathsf {TP} \). First, we note that it is bound by the multiplicative depth needed to compute \( v.leaf\_value=\sum _{x \in \mathcal {X}} [\![ w_x ]\!] \cdot \mathbf {c_{y_x}} \), which is equivalent to 1 plus the multiplicative depth of ciphertexts in \( \mathsf {W}_v \) for \( v \)’s at depth \( d-1 \). The multiplicative depth of these ciphertexts is \( \left(\log (deg(\phi)) + 1\right) + (d - 1) \) and hence the total multiplicative depth of the protocol is \( \log (deg(\phi)) + d + 1 \).

Next we prove that \( \mathsf {TP}=\langle \mathsf {Clnt}_\mathsf {TP},\mathsf {Srv}_\mathsf {TP}\rangle \) is complete. Observe that \( \langle \mathsf {Clnt}_\mathsf {TP},\mathsf {Srv}_\mathsf {TP}\rangle \) homomorphically evaluates the same function as computed in Algorithm 2. So completeness follows immediately from the correctness of \( \mathcal {E} \).

Finally, we prove that \( \mathsf {TP} \) satisfies the privacy condition of Definition 2.5. Assume by contradiction that privacy does not hold for \( \mathsf {TP} \). That is, there exists a PPT distinguisher \( \mathcal {D} \) that chooses \( (\mathcal {X}_0,\mathcal {Y}_0) , (\mathcal {X}_1,\mathcal {Y}_1) \in \mathsf {A} \) with \( |\mathcal {X}_0|=|\mathcal {X}_1| \), and \( |\mathcal {Y}_0|=|\mathcal {Y}_1| \), and a polynomial \( p(\cdot) \) such that for infinitely many \( \lambda \in \mathbb {N} \): (16) \( \begin{align} \begin{split} \Pr &\left[\mathcal {D}(\textsf {view}^\mathsf {TP}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X}_1,\mathcal {Y}_1),\bot ,\lambda)) = 1 \right] -\Pr \left[\mathcal {D}(\textsf {view}^\mathsf {TP}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X}_0,\mathcal {Y}_0),\bot ,\lambda)) = 1\right] \ge 1/p(\lambda) \end{split} \end{align} \) We show below that given \( \mathcal {D} \) we can construct an adversary \( \mathcal {A} \) that violates the \( \textsf {CPA} \) security of \( \mathcal {E} \).

The adversary \( \mathcal {A} \) participates in \( \mathsf {EXP}^{cpa}_{\mathcal {A},\mathcal {E}} \) as follows:

(1)

Upon receiving \( pk \), execute \( \mathcal {D} \) to obtain \( (\mathcal {X}_0,\mathcal {Y}_0),(\mathcal {X}_1,\mathcal {Y}_1) \), and send them to \( \textsf {Chal} \).

(2)

Upon receiving \( \mathsf {CTXT}\leftarrow \mathsf {Enc}_{pk}(\mathcal {X}_b,\mathcal {Y}_b) \) behave exactly as \( \mathsf {Srv}_\mathsf {TP} \) behaves while executing \( \mathsf {TP} \) upon receiving \( \mathsf {CTXT}=\lbrace \mathbf {c_x},\mathbf {c_{y_x}}\rbrace \) and \( pk \) from \( \mathsf {Clnt}_\mathsf {TP} \), except that every message \( [\![ \mathsf {right}[i, \theta ] ]\!] , [\![ \mathsf {left}[i, \theta ] ]\!] _{i\in [k],\theta \in S} \) sent from \( \mathsf {Srv}_\mathsf {TP} \) to \( \mathsf {Clnt}_\mathsf {TP} \) is answered by \( \mathcal {A} \) as follows: \( \mathcal {A} \) samples uniformly at random \( i \leftarrow [k] \) and \( \theta \leftarrow S \), computes \( \mathsf {Enc}_{pk}(i,\theta) \), and sends this ciphertext to \( \mathsf {Srv}_\mathsf {TP} \) as if it were the response from \( \mathsf {Clnt}_\mathsf {TP} \).

(3)

Run the distinguisher \( \mathcal {D} \) on \( \textsf {view}_{\mathsf {Srv}_\mathsf {TP}} \) (\( \mathsf {Srv}_\mathsf {TP} \)’s view in \( \mathcal {A} \) during step 2) and output whatever \( \mathcal {D} \) outputs.

The adversary \( \mathcal {A} \) is \( \text{PPT} \) due to \( (\mathcal {X}_0,\mathcal {Y}_0),(\mathcal {X}_1,\mathcal {Y}_1) \) being efficiently samplable and \( \mathsf {Srv}_\mathsf {TP} \) and \( \mathcal {D} \) being PPT. Note that \( \mathsf {TP} \) is almost perfectly simulated except that the queries to \( \mathsf {Clnt}_\mathsf {TP} \) are simulated using encryption of randomly sampled elements. Let \( \mathsf {TP}^{\prime } \) denote this variant of \( \mathsf {TP} \) that is simulated by \( \mathcal {A} \), namely \( \mathsf {TP}^{\prime } \) is a protocol identical to \( \mathsf {TP} \) except that each query to \( \mathsf {Clnt}_\mathsf {TP} \) is answered by the encryption of a randomly sampled pair \( (i,\theta) \leftarrow [k]\times S \). We denote by \( \textsf {view}^{\mathsf {EXP}^{cpa}}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X}_{b},\mathcal {Y}_{b}),\bot ,\lambda) \) the view of \( \mathsf {Srv}_\mathsf {TP} \), simulated by \( \mathcal {A} \), in the execution of \( \mathsf {EXP}^{cpa}_{\mathcal {A},\mathcal {E}} \) with bit \( b \) being selected by the challenger. By definition of \( \mathsf {TP}^{\prime } \) it holds that for every \( b\in \lbrace 0,1\rbrace \), (17) \( \begin{align} \begin{split} \Pr &\left[\mathcal {D}(\textsf {view}^{\mathsf {TP}^{\prime }}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X}_{b},\mathcal {Y}_{b}),\bot ,\lambda)) = 1 \right] =\Pr \left[\mathcal {D}(\textsf {view}^{\mathsf {EXP}^{cpa}}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X}_{b},\mathcal {Y}_{b}),\bot ,\lambda))=1\right] \end{split} \end{align} \) Furthermore, the \( \textsf {CPA} \) security of \( \mathcal {E} \) guarantees that the server’s view in \( \mathsf {TP} \) and \( \mathsf {TP}^{\prime } \) is computationally indistinguishable (see Lemma 4.3). Putting together Lemma 4.3 and Equation 16 it follows that (18) \( \begin{align} \begin{split} \Pr &[\mathcal {D}(\textsf {view}^{\mathsf {TP}^{\prime }}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X}_1,\mathcal {Y}_1),\bot ,\lambda)) = 1 ] -\Pr [\mathcal {D}(\textsf {view}^{\mathsf {TP}^{\prime }}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X}_0,\mathcal {Y}_0),\bot ,\lambda)) = 1 ] \ge \frac{1}{p(\lambda)} - {\mathsf {neg}}(\lambda)\ . \end{split} \end{align} \) Next, from Equations 17 and 18 it follows that: (19) \( \begin{align} \begin{split} \Pr &[\mathcal {D}(\textsf {view}^{\mathsf {EXP}^{cpa}}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X}_1,\mathcal {Y}_1),\bot ,\lambda))=1] -\Pr [\mathcal {D}(\textsf {view}^{\mathsf {EXP}^{cpa}}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X}_0,\mathcal {Y}_0),\bot ,\lambda))=1] \ge \frac{1}{p(\lambda)}-{\mathsf {neg}}(\lambda). \end{split} \end{align} \) Therefore, we obtain that:

where the last inequality follows from Equation 19. Combining this with \( \mathcal {A} \) being \( \text{PPT} \) we derive a contradiction to \( \mathcal {E} \) being \( \textsf {CPA} \) secure. This concludes the proof.

Let \( \mathsf {TP}^{\prime }=\langle \mathsf {Clnt}_\mathsf {TP}^{\prime },\mathsf {Srv}_\mathsf {TP}\rangle \) be as defined in the proof of Theorem 4.2, i.e., is identical to \( \mathsf {TP}=\langle \mathsf {Clnt}_\mathsf {TP},\mathsf {Srv}_\mathsf {TP}\rangle \) except that \( \mathsf {Clnt}_\mathsf {TP}^{\prime } \) samples \( (i,\theta) \leftarrow [k]\times S \) at random instead of executing Step 2d, Figure 7. We show that the server is indifferent to the correctness of answers it receives from the client in the sense that its view in \( \mathsf {TP} \) and \( \mathsf {TP}^{\prime } \) is indistinguishable.

Lemma 4.3.

For every efficiently samplable \( (\mathcal {X},\mathcal {Y})\in \mathsf {A} \), and all \( \lambda \in \mathbb {N} \) the following holds: \( \begin{equation*} \textsf {view}^{\mathsf {TP}^{\prime }}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X},\mathcal {Y}),\bot ,\lambda) \approx _c \textsf {view}^\mathsf {TP}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X},\mathcal {Y}),\bot ,\lambda) \end{equation*} \)

Proof. Assume by contradiction that Lemma 4.3 does not hold. That is, there exists a \( \text{PPT} \) distinguisher \( \mathcal {D} \) that chooses \( (\mathcal {X},\mathcal {Y})\in \mathsf {A} \) and a polynomial \( p(\cdot) \) such that for infinitely many \( \lambda \in \mathbb {N} \): (20) \( \begin{align} \begin{split} \Pr &\left[\mathcal {D}(\textsf {view}^{\mathsf {TP}^{\prime }}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X},\mathcal {Y}),\bot ,\lambda)) = 1 \right] -\Pr \left[\mathcal {D}(\textsf {view}^\mathsf {TP}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X},\mathcal {Y}),\bot ,\lambda)) = 1\right] \ge 1/p(\lambda)\ . \end{split} \end{align} \)

We define a series of hybrid executions that gradually move between \( \mathsf {TP}=\langle \mathsf {Clnt}_\mathsf {TP},\mathsf {Srv}_\mathsf {TP}\rangle \) execution (where \( \mathsf {Gini} \) impurity is used) to \( \mathsf {TP}^{\prime }=\langle \mathsf {Clnt}^{\prime }_\mathsf {TP},\mathsf {Srv}_\mathsf {TP}\rangle \) execution (where random \( i \) and \( \theta \) are used). Let \( q \) denote the number of queries made to \( \mathsf {Clnt}_\mathsf {TP} \) in Figure 6 (\( \mathsf {Srv}_\mathsf {TP} \) makes a single query to \( \mathsf {Clnt}_\mathsf {TP} \) per constructed node in \( \mathsf {T} \)). We define \( q+1 \) hybrids as follows:

  • \( \mathbf{Hybrid}\;\; \mathsf {H}_0 \)is defined as the execution of \( \langle \mathsf {Clnt}_\mathsf {TP},\mathsf {Srv}_\mathsf {TP}\rangle \).

  • \( \mathbf{Hybrid}\;\; \mathsf {H}_j ( j=1,\ldots ,q ) \) is similar to \( \mathsf {H}_0 \) except that the last \( j \) queries to \( \mathsf {Clnt}_\mathsf {TP} \) are answered by sampling uniformly random \( (i,\theta)\leftarrow [k]\times S \) and responding with \( \mathsf {Enc}_{pk}(i,\theta) \) (instead of executing 2b in Figure 7).

Note that in each pair of adjacent hybrids \( \mathsf {H}_{j-1} \) and \( \mathsf {H}_{j} \) for \( j\in [q] \) the difference is that in \( \mathsf {H}_{j} \) the \( (q+1-j) \)’th query is answered using random \( (i,\theta) \) instead of those maximizing the \( \mathsf {Gini} \) impurity. Denote by \( \textsf {view}^{\mathsf {H}_j}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X},\mathcal {Y}),\bot ,\lambda) \) the view of \( \mathsf {Srv}_\mathsf {TP} \) in the hybrid \( \mathsf {H}_j \).

By the hybrid argument it follows from Equation 20 there exists \( j\in [q] \) such that: (21) \( \begin{align} \begin{split} \Pr &\left[\mathcal {D}(\textsf {view}^{\mathsf {H}_j}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X},\mathcal {Y}),\bot ,\lambda)) = 1 \right] -\Pr \left[\mathcal {D}(\textsf {view}^{\mathsf {H}_{j-1}}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X},\mathcal {Y}),\bot ,\lambda)) = 1\right] \ge \frac{1}{q}\cdot \frac{1}{p(\lambda)} \end{split} \end{align} \)

We show that Equation 21 contradicts \( \mathcal {E} \) being \( \textsf {CPA} \) secure. That is, we construct an adversary \( \mathcal {A} \) that communicates with the challenger \( \textsf {Chal} \) in the CPA indistinguishability experiment \( \mathsf {EXP}^{cpa}_{\mathcal {A},\mathcal {E}} \) and forces the output to be 1 with a non-negligible advantage over half. Concretely, \( \mathcal {A} \) participates in \( \mathsf {EXP}^{cpa}_{\mathcal {A},\mathcal {E}} \) as follows:

(1)

\( \mathcal {A} \) executes Algorithm 2 on \( (\mathcal {X},\mathcal {Y}) \) to compute for each \( v\in \mathsf {T} \), the associated feature \( v.feature \) and threshold \( v.\theta \).

(2)

Upon receiving \( pk \) from \( \textsf {Chal} \), \( \mathcal {A} \) computes \( \mathsf {CTXT}\leftarrow \mathsf {Enc}_{pk}(\mathcal {X},\mathcal {Y}) \) and executes \( \mathsf {Srv}_\mathsf {TP} \) on \( (\mathsf {CTXT}, pk) \) while answering each query that \( \mathsf {Srv}_\mathsf {TP} \) as follows:

(a)

For the \( q-j \) first queries of \( \mathsf {Srv}_\mathsf {TP} \), \( \mathcal {A} \) encrypts under \( pk \) the feature \( v.feature \) and threshold \( v.\theta \) associated with the queried node \( v \), and sends the resulting ciphertexts to \( \mathsf {Srv}_\mathsf {TP} \).

(b)

For the \( (q-j+1) \)’th query of \( \mathsf {Srv}_\mathsf {TP} \), \( \mathcal {A} \) proceeds as follows:

(i)

Denote by \( (i_0,\theta _0) \) the feature and threshold associated with the queried node. \( \mathcal {A} \) samples uniformly random \( (i_1,\theta _1) \leftarrow [k]\times S \), and sends \( (i_0,\theta _0) \) and \( (i_1,\theta _1) \) to \( \textsf {Chal} \).

(ii)

Upon receiving from \( \textsf {Chal} \) the challenge ciphertext \( c \leftarrow Enc_{pk}(i_b,\theta _b) \) for uniformly random \( b\leftarrow \lbrace 0,1\rbrace \), \( \mathcal {A} \) forwards this ciphertext \( c \) to \( \mathsf {Srv}_\mathsf {TP} \).

(c)

For the rest of the queries, \( \mathcal {A} \) samples uniformly random \( (i,\theta)\leftarrow [k]\times S \), and sends \( \mathsf {Enc}_{pk}(i,\theta) \) to \( \mathsf {Srv}_\mathsf {TP} \).

(3)

\( \mathcal {A} \) executes the distinguisher \( \mathcal {D} \) on the view of \( \mathsf {Srv}_\mathsf {TP} \) during the execution of Step 2 above, denoted \( \textsf {view}_{\mathsf {Srv}_\mathsf {TP}} \), and outputs whatever \( \mathcal {D} \) outputs.

We note that if \( b=0 \), then the challenge ciphertext \( c \) is the encryption of \( (v.feaure,v.\theta) \) and \( \textsf {view}_{\mathsf {Srv}_\mathsf {TP}} \) is exactly as in \( H_{j-1} \) and otherwise as in \( H_j \). Therefore, we obtain that (22) \( \begin{align} \begin{split} \Pr &\left[\mathsf {EXP}^{cpa}_{\mathcal {A},\mathcal {E}}(\lambda) = 1\right]\\ {[}1em] =& \frac{1}{2} \cdot \left(\Pr [\mathsf {EXP}^{cpa}_{\mathcal {A},\mathcal {E}}(\lambda) = 1|b=1] + \Pr [\mathsf {EXP}^{cpa}_{\mathcal {A},\mathcal {E}}(\lambda) = 1|b=0]\right)\\ {[}1em] = &\frac{1}{2} \cdot \left(\Pr \left[\mathcal {D}\left(\textsf {view}^{\mathsf {H}_j}_{\mathsf {Srv}_\mathsf {TP}}((\mathcal {X},\mathcal {Y}),\bot ,\lambda)\right) = 1\right]\right. \\ {[}1em] & \left.+ 1 - \Pr \left[\mathcal {D}\left(\textsf {view}^{\mathsf {H}_{j-1}}_{\mathsf {Srv}_\mathsf {TP}}\left((\mathcal {X},\mathcal {Y}),\bot ,\lambda \right)\right) = 1\right]\right) \ge \frac{1}{2}+ \frac{1}{2}\cdot \frac{1}{q\cdot p(\lambda)} \end{split} \end{align} \) which concludes the proof.

Skip 5IMPLEMENTATION DETAILS AND EXPERIMENTAL RESULTS Section

5 IMPLEMENTATION DETAILS AND EXPERIMENTAL RESULTS

We empirically evaluated our decision trees algorithms and protocols for both accuracy and run-time performance on encrypted data (see Sections 5.1-5.2, respectively). Our empirical evaluation was done with respect to a single decision tree, and can naturally be extended to random forests where trees are trained/evaluated in parallel. The soft-step function we employed are polynomials of degree 8 and 16 in prediction and training over encrypted data, respectively, constructed via Equation 2 with a weighting function \( w:[-2,2]\rightarrow [0,1] \) defined to be zero in the interval \( [-0.2, 0.2] \) and a constant positive value elsewhere. For training, we used thresholds on a 0.05 grid in the \( [-1,1] \) interval. We used standard UCI repository datasets [20] in our evaluation, ranging in size from very small (iris with 4 features and 150 examples) to the moderate size common in real-world applications (forest cover with 54 features and over half a million examples).

5.1 Accuracy of our Decision-Tree Algorithms

The accuracy of our Algorithms 1-2 was evaluated in comparison to standard trees, on the benchmark datasets. We used a 3-fold cross-validation procedure, where each dataset was randomly partitioned into three equal-size parts, and each of the three parts serves as a test-set for a classifier trained on the remaining two. The overall accuracy is calculated as the percentage of correct classification on test examples (each example in the data is taken exactly once, so the accuracy reported is simply the percentage of examples that were correctly classified).

We compared all four possible combinations of training and prediction according to our algorithms vs. the standard algorithms; see Figure 8. We trained trees up to depth 5, as customary when using random forests.

Fig. 8.

Fig. 8. Accuracy of ours vs. scikit-learn [40] algorithms on four UCI datasets and tree depth 0–5 (depth 0 is the majority-class baseline), in four execution modes: our training and prediction (red), our training and scikit-learn prediction (green), scikit-learn training and our prediction (orange), scikit-learn training and prediction (blue).

The results show an overall comparable accuracy, indicating that our Algorithms 1-2 are valid replacements for standard decision trees in terms of accuracy. In three out of the four datasets (Cancer, Iris and Wine) the accuracy of our algorithm in fact outperforms that of scikit-learn. This can be explained by the fact that our algorithm assigns intermediate values to samples at the proximity of threshold (cf. 0-1 values in scikit-learn), which is a good heuristic as samples close to the threshold are often less indicative. In contrast, on Cover dataset our algorithm perform slightly worst than scikit-learn (a few percentages difference on depth 4-5 trees), this is not surprising since all the examined algorithms are heuristics for which bad datasets are known to exist. In seems indeed that the Cover dataset, which is a richer dataset than the other we tested, with a larger number of features and labels, is not as well captured by a tree model, be it ours or scikit-learn. We remark that while our algorithm attains comparable accuracy to the standard algorithm, we recommend using it only on datasets where the standard decision tree algorithm is sufficiently accurate.

Our method can employ various polynomials approximation for the step-function, including \( \ell _\infty \) (Remez) and \( \ell _2 \) (MSE) approximations. We measure accuracy on the UCI datasets (Iris, Wine, Cancer and Cover) using both approximation methods of various degrees and neglected window widths. Our experiments indicate that our method yields high accuracy when using both Remez and MSE approximations for the step function; see Figure 9. MSE has the advantage, however, of avoiding the numeric instability associated with implementations of Remez polynomials.

Fig. 9.

Fig. 9. Accuracy ( \( y \) -axis) of degree 16 step-function MSE and Remez approximation on various neglected window width ( \( x \) -axis) tested on four UCI datasets and trees of depth 4, in four execution modes: training with step-function MSE and prediction with scikit-learn (blue), training and prediction with step-function MSE (orange), training with step-function Remez and prediction with scikit-learn (cyan), training and prediction with step-function Remez (green).

Increasing the degree of the approximating polynomial initially improves accuracy until saturation is reached, where further increase essentially does not improve accuracy and may even harm it. The saturation degree, in both MSE and Remez approximations, is 4 on Iris dataset, 8 on Wine and Cancer, and 16 on Cover (see Figure 10).

Fig. 10.

Fig. 10. Accuracy ( \( y \) -axis) using MSE (left) and Remez (right) approximation polynomials of degrees 4,8,16,24 with neglected window width 0.1, performed on trees of depth 4 and four UCI datasets: Iris (blue), wine (orange), cancer (gray) and cover (yellow) datasets.

5.2 Running-Time on Encrypted Data

We implemented our training and prediction protocols over data encrypted with CKKS homomorphic encryption scheme [14] in Microsoft SEAL v3.3.2 [46]. We ran experiments to measure our performance for training and prediction on encrypted data, with full binary trees of depth \( d=4 \) (i.e., \( m=31 \) nodes); furthermore, we measure performance of evaluating the soft-step approximation over encrypted data. All runs are with 128-bit security level.

Training over encrypted datasets.. Our experiments were executed on AWS x1.16xlarge as the server. The degree of the cyclotomic polynomial was set in SEAL to 16384, with 35 bits of precision for the plaintext moduli at each level of homomorphic computation. The input examples are encrypted feature-by-feature, while packing 8192 data values in each ciphertext; the associated labels (in 1-hot encoding) are likewise encrypted in a packed form. The server run-time on encrypted UCI datasets (subs-sampled) ranged from less than an hour on the smallest dataset to a day and a half on the largest one; see Table 1.

The client (e.g., KMS) run-time in all experiments ranged from seconds to under three minutes. The communication complexity is independent of the dataset size. See Figure 11 for the number of transmitted ciphertexts. The ciphertext size is roughly 0.5MB after compression.

Fig. 11.

Fig. 11. Total number of transmitted ciphertexts vs. the training set size, when training a full tree of depth 4 on various (features, labels, splits) settings.

Prediction over encrypted data. We ran our prediction protocol on encrypted examples and cleartext trees. The degree of the cyclotomic polynomial was set to 8192, with 22 bits of precision for the plaintext moduli at each level of homomorphic computation. No packing was employed. The experiments were executed on a 2.3 GHz 8-Core Intel Core i9 personal computer, exhibiting average run-time of 2.3 seconds for evaluating the entire 31-nodes tree over the encrypted sample (standard deviation of 0.04 seconds). The amortized run-time for batched prediction is \( 0.56 \) milliseconds per sample, when packing 4,096 feature values in each ciphertext, as supported by the parameters in these experiments.

Evaluating soft-step function over encrypted data. Homomorphic evaluation of our soft-step function takes 0.12 (respectively, 0.43) seconds when executed with parameters as in our experiments for prediction (respectively, training) over encrypted data. The increase in run-time is primarily due to the increase in the cyclotomic polynomial degree from 8196 to 16384, whereas only a minor run-time increase is incurred by increasing from 8 to 16 the degree of the polynomial employed as our soft-step approximation; see Table 2. The amortized run-time exhibits a more moderate increase, from 30.5 \( \mu \)s for each homomorphic evaluation of the soft-step function during prediction, to 52.5 \( \mu \)s in training. This moderation is due to packing twice the plaintext values in each ciphertext when using a cyclotomic polynomial degree of 16384 as in training rather than 8196 as in prediction. The width of the neglected window has no affect on the run-time as it only changes the values of the polynomial coefficient, but not their quantity or evaluation time.

Skip 6CONCLUSIONS Section

6 CONCLUSIONS

In this work we present \( \mathsf {FHE } \)-friendly algorithms and protocols for decision tree based prediction and training over encrypted data that are the first to attain all the following desired properties: non-interactive prediction, lightweight client and communication in training, and rigorous privacy guarantee. We ran extensive experiments on standard UCI and synthetic datasets, all encrypted with fully homomorphic encryption, demonstrating high accuracy comparable to standard algorithms on cleartext data, fast prediction and feasible training. Our protocols support real-life enterprise use-cases, and are well suited for offline execution, e.g., in nightly batched prediction.

Skip ACKNOWLEDGMENTS Section

ACKNOWLEDGMENTS

The authors wish to thank Boaz Sapir for his tremendous contribution in conducting the latest experiments on SEAL and Yaron Sheffer for sharing his knowledge on industry architecture and best practices and for helpful discussions.

Footnotes

  1. 1 An extended abstract of this work was published in ECML/PKDD 2020(1):145-161 [2]. Preprints appeared in IACR Cryptology ePrint Archive Reports 2019/1282 [1] and 2021/803 [3].

  2. 2 Finite field homomorphic computations are required due to their reliance on Fermat’s Little Theorem for comparison. Their specialized encoding is as follows: Lue et al. [38] encode data \( a \) by a monomial \( X^a \), Tueno et al. [48] employ the Lin-Tzeng [35] encoding that reduces Greater-Than to Set-Intersection.

    Footnote

REFERENCES

  1. [1] Akavia Adi, Leibovich Max, Resheff Yehezkel S., Ron Roey, Shahar Moni, and Vald Margarita. 2019. Privacy-Preserving Decision Tree Training and Prediction against Malicious Server. Cryptology ePrint Archive, Report 2019/1282. (2019).Google ScholarGoogle Scholar
  2. [2] Akavia Adi, Leibovich Max, Resheff Yehezkel S., Ron Roey, Shahar Moni, and Vald Margarita. 2020. Privacy-preserving decision trees training and prediction. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2020, Ghent, Belgium, September 14-18, 2020, Proceedings, Part I (Lecture Notes in Computer Science), Hutter Frank, Kersting Kristian, Lijffijt Jefrey, and Valera Isabel (Eds.), Vol. 12457. Springer, 145161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Akavia Adi, Leibovich Max, Resheff Yehezkel S., Ron Roey, Shahar Moni, and Vald Margarita. 2021. Privacy-Preserving Decision Trees Training and Prediction. Cryptology ePrint Archive, Report 2021/768. (2021). https://eprint.iacr.org/2021/768.Google ScholarGoogle Scholar
  4. [4] Barni Mauro, Failla Pierluigi, Kolesnikov Vladimir, Lazzeretti Riccardo, Sadeghi Ahmad-Reza, and Schneider Thomas. 2009. Secure evaluation of private linear branching programs with medical applications. In European Symposium on Research in Computer Security. Springer, 424439.Google ScholarGoogle Scholar
  5. [5] Blatt Marcelo, Gusev Alexander, Polyakov Yuriy, Rohloff Kurt, and Vaikuntanathan Vinod. 2019. Optimized Homomorphic Encryption Solution for Secure Genome-Wide Association Studies. Cryptology ePrint Archive, Report 2019/223. (2019). https://eprint.iacr.org/2019/223.Google ScholarGoogle Scholar
  6. [6] Bost Raphael, Popa Raluca Ada, Tu Stephen, and Goldwasser Shafi. 2015. Machine learning classification over encrypted data. In NDSS, Vol. 4324. 4325.Google ScholarGoogle Scholar
  7. [7] Boura Christina, Gama Nicolas, Georgieva Mariya, and Jetchev Dimitar. 2020. CHIMERA: Combining ring-LWE-based fully homomorphic encryption schemes. J. Math. Cryptol. 14, 1 (2020), 316338. Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Brakerski Zvika. 2012. Fully homomorphic encryption without modulus switching from classical GapSVP. In Advances in Cryptology - CRYPTO 2012-32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19–23, 2012. Proceedings. 868886.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Brakerski Zvika, Gentry Craig, and Vaikuntanathan Vinod. 2012. (Leveled) fully homomorphic encryption without bootstrapping. In Innovations in Theoretical Computer Science 2012, Cambridge, MA, USA, January 8–10, 2012. 309325.Google ScholarGoogle Scholar
  10. [10] Brickell Justin, Porter Donald E., Shmatikov Vitaly, and Witchel Emmett. 2007. Privacy-preserving remote diagnostics. In Proceedings of the 14th ACM Conference on Computer and Communications Security. ACM, 498507.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Cetin Gizem S., Doroz Yarkin, Sunar Berk, and Martin William J.. 2015. Arithmetic Using Word-Wise Homomorphic Encryption. Cryptology ePrint Archive, Report 2015/1195. (2015). https://eprint.iacr.org/2015/1195.Google ScholarGoogle Scholar
  12. [12] Chen Hao, Gilad-Bachrach Ran, Han Kyoohyung, Huang Zhicong, Jalali Amir, Laine Kim, and Lauter Kristin. 2018. Logistic regression over encrypted data from fully homomorphic encryption. BMC Medical Genomics 11, 4 (2018), 81.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Chen Hao, Gilad-Bachrach Ran, Kyoohyung Han, Huang Zhicong, Jalali Amir, Laine Kim, and Lauter Kristin. 2018. Logistic regression over encrypted data from fully homomorphic encryption. BMC Medical Genomics 11 (10 2018).Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Cheon Jung Hee, Kim Andrey, Kim Miran, and Song Yong Soo. 2017. Homomorphic encryption for arithmetic of approximate numbers. In Advances in Cryptology - ASIACRYPT 2017-23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, December 3–7, 2017, Proceedings, Part I. 409437.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Cheon Jung Hee, Kim Dongwoo, and Kim Duhyeong. 2020. Efficient homomorphic comparison methods with optimal complexity. In Advances in Cryptology – ASIACRYPT 2020, Moriai Shiho and Wang Huaxiong (Eds.). Springer International Publishing, Cham, 221256.Google ScholarGoogle Scholar
  16. [16] Cheon Jung Hee, Kim Dongwoo, Kim Duhyeong, Lee Hun Hee, and Lee Keewoo. 2019. Numerical method for comparison on homomorphically encrypted numbers. In Advances in Cryptology – ASIACRYPT 2019, Galbraith Steven D. and Moriai Shiho (Eds.). Springer International Publishing, Cham, 415445.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Cock Martine De, Dowsley Rafael, Horst Caleb, Katti Raj, Nascimento Anderson C. A., Poon Wing-Sea, and Truex Stacey. 2017. Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Transactions on Dependable and Secure Computing 16, 2 (2017), 217230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Hoogh Sebastiaan de, Schoenmakers Berry, Chen Ping, and Akker Harm op den. 2014. Practical secure decision tree learning in a teletreatment application. In International Conference on Financial Cryptography and Data Security. Springer, 179194.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Du Wenliang and Zhan Zhijun. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE International Conference on Privacy, Security and Data Mining-Volume 14. Australian Computer Society, Inc., 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Dua Dheeru and Graff Casey. 2017. UCI Machine Learning Repository. (2017).Google ScholarGoogle Scholar
  21. [21] Emekci F., Sahin O. D., Agrawal D., and Abbadi A. El. 2007. Privacy preserving decision tree learning over multiple parties. Data Knowl. Eng. 63, 2 (Nov. 2007), 348361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Fan Junfeng and Vercauteren Frederik. 2012. Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012 (2012), 144.Google ScholarGoogle Scholar
  23. [23] Gentry Craig. 2009. A Fully Homomorphic Encryption Scheme. Ph.D. Dissertation. Stanford University. crypto.stanford.edu/craig.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Gilad-Bachrach Ran, Dowlin Nathan, Laine Kim, Lauter Kristin, Naehrig Michael, and Wernsing John. 2016. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning. 201210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Hazay Carmit and Lindell Yehuda. 2010. Efficient Secure Two-Party Protocols: Techniques and Constructions (1st ed.). Springer-Verlag, Berlin.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Hesamifard Ehsan, Takabi Hassan, Ghasemi Mehdi, and Wright Rebecca. 2018. Privacy-preserving machine learning as a service. Proceedings on Privacy Enhancing Technologies 2018 (06 2018), 123142.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Joye Marc and Salehi Fariborz. 2018. Private yet efficient decision tree evaluation. In IFIP Annual Conference on Data and Applications Security and Privacy. Springer, 243259.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Kim Andrey, Song Yongsoo, Kim Miran, Lee Keewoo, and Cheon Jung. 2018. Logistic regression model training based on the approximate homomorphic encryption. BMC Medical Genomics 11 (10 2018).Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Kim Andrey, Song Yongsoo, Kim Miran, Lee Keewoo, and Cheon Jung Hee. 2018. Logistic regression model training based on the approximate homomorphic encryption. BMC Medical Genomics 11, 4 (2018), 83.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Kim Miran, Song Yongsoo, Wang Shuang, Xia Yuhou, and Jiang Xiaoqian. 2017. Secure logistic regression based on homomorphic encryption: Design and evaluation. JMIR Medical Informatics 6 (08 2017).Google ScholarGoogle Scholar
  31. [31] Kiss Ágnes, Naderpour Masoud, Liu Jian, Asokan N., and Schneider Thomas. 2019. SoK: Modular and efficient private decision tree evaluation. PoPETs 2019, 2 (2019), 187208.Google ScholarGoogle Scholar
  32. [32] Kyoohyung Han, Hong Seungwan, Cheon Jung, and Park Daejun. 2019. Logistic regression on homomorphic encrypted data at scale. Proceedings of the AAAI Conference on Artificial Intelligence 33 (07 2019), 94669471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Laurent Hyafil and Rivest Ronald L.. 1976. Constructing optimal binary decision trees is NP-complete. Information Processing Letters 5, 1 (1976), 1517.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Lee Eunsang, Lee Joon-Woo, Kim Young-Sik, and No Jong-Seon. 2021. Minimax approximation of sign function by composite polynomial for homomorphic comparison. IEEE Transactions on Dependable and Secure Computing (2021), 11. Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Lin Hsiao-Ying and Tzeng Wen-Guey. 2005. An efficient solution to the millionaires’ problem based on homomorphic encryption. In International Conference on Applied Cryptography and Network Security. Springer, 456466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Lindell Yehuda and Pinkas Benny. 2000. Privacy preserving data mining. In Annual International Cryptology Conference. Springer, 3654.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Lory Peter. 2012. Enhancing the efficiency in privacy preserving learning of decision trees in partitioned databases. In Privacy in Statistical Databases, Domingo-Ferrer Josep and Tinnirello Ilenia (Eds.). Springer Berlin, Berlin,322335.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Lu Wen-jie, Zhou Jun-Jie, and Sakuma Jun. 2018. Non-interactive and output expressive private comparison from homomorphic encryption. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. 6774.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Nandakumar Karthik, Ratha Nalini K., Pankanti Sharath, and Halevi Shai. 2019. Towards deep neural network training on encrypted data. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA, June 16–20, 2019. Computer Vision Foundation/IEEE, 0.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., and Duchesnay E.. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 28252830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Quinlan J. Ross. 1986. Induction of decision trees. Machine Learning 1, 1 (1986), 81106.Google ScholarGoogle Scholar
  42. [42] Remez Eugene Y.. 1934. Sur la détermination des polynômes d’approximation de degré donnée. Comm. Soc. Math. Kharkov 10, 4163 (1934), 196.Google ScholarGoogle Scholar
  43. [43] Rivest R. L., Adleman L., and Dertouzos M. L.. 1978. On data banks and privacy homomorphisms. Foundations of Secure Computation, Academia Press (1978), 169179.Google ScholarGoogle Scholar
  44. [44] Rivlin Theodore J.. 2003. An Introduction to the Approximation of Functions. Courier Corporation.Google ScholarGoogle Scholar
  45. [45] Samet Saeed and Miri Ali. 2008. Privacy preserving ID3 using Gini index over horizontally partitioned data. In Proceedings of the 2008 IEEE/ACS International Conference on Computer Systems and Applications (AICCSA’08). IEEE Computer Society, Washington, DC, USA, 645651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] SEAL 2019. Microsoft SEAL (Release 3.3). https://github.com/Microsoft/SEAL. (June 2019). Microsoft Research, Redmond, WA.Google ScholarGoogle Scholar
  47. [47] Tai Raymond K. H., Ma Jack P. K., Zhao Yongjun, and Chow Sherman S. M.. 2017. Privacy-preserving decision trees evaluation via linear functions. In European Symposium on Research in Computer Security. Springer, 494512.Google ScholarGoogle Scholar
  48. [48] Tueno Anselme, Boev Yordan, and Kerschbaum Florian. 2020. Non-interactive private decision tree evaluation. In Data and Applications Security and Privacy XXXIV, Singhal Anoop and Vaidya Jaideep (Eds.). Springer International Publishing, Cham, 174194.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Tueno Anselme, Kerschbaum Florian, and Katzenbeisser Stefan. 2019. Private evaluation of decision trees using sublinear cost. PoPETs 2019, 1 (2019), 266286.Google ScholarGoogle Scholar
  50. [50] Vaidya Jaideep, Clifton Chris, Kantarcioglu Murat, and Patterson A. Scott. 2008. Privacy-preserving decision trees over vertically partitioned data. ACM Transactions on Knowledge Discovery from Data (TKDD) 2, 3 (2008), 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Wang Ke, Xu Yabo, She Rong, and Yu Philip S.. 2006. Classification spanning private databases. In Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, July 16–20, 2006, Boston, Massachusetts, USA. AAAI Press, 293298.Google ScholarGoogle Scholar
  52. [52] Wu David J., Feng Tony, Naehrig Michael, and Lauter Kristin. 2016. Privately evaluating decision trees and random forests. Proceedings on Privacy Enhancing Technologies 2016, 4 (2016), 335355.Google ScholarGoogle Scholar
  53. [53] Xiao Ming-Jun, Huang Liu-Sheng, Luo Yong-Long, and Shen Hong. 2005. Privacy preserving ID3 algorithm over horizontally partitioned data. In Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT’05). IEEE Computer Society, Washington, DC, USA, 239243.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy-Preserving Decision Trees Training and Prediction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!