Abstract
With the advent of the Internet of things (IoT), billions of devices are expected to continuously collect and process sensitive data (e.g., location, personal health factors). Due to the limited computational capacity available on IoT devices, the current de facto model for building IoT applications is to send the gathered data to the cloud for computation. While building private cloud infrastructures for handling large amounts of data streams can be expensive, using low-cost public (untrusted) cloud infrastructures for processing continuous queries including sensitive data leads to strong concerns over data confidentiality.
This article presents C3PO, a confidentiality-preserving, continuous query processing engine, that leverages the public cloud. The key idea is to intelligently utilize partially homomorphic and property-preserving encryption to perform as many computationally intensive operations as possible—without revealing plaintext—in the untrusted cloud. C3PO provides simple abstractions to the developer to hide the complexities of applying complex cryptographic primitives, reasoning about the performance of such primitives, deciding which computations can be executed in an untrusted tier, and optimizing cloud resource usage. An empirical evaluation with several benchmarks and case studies shows the feasibility of our approach. We consider different classes of IoT devices that differ in their computational and memory resources (from a Raspberry Pi 3 to a very small device with a Cortex-M3 microprocessor) and through the use of optimizations, we demonstrate the feasibility of using partially homomorphic and property-preserving encryption on IoT devices.
1 INTRODUCTION
The ubiquity of computing devices is driving a massive increase in the amount of data generated by humans and machines. With the advent of the IoT, many more billions of devices are expected to continuously collect sensitive data and compute on it, promising improvements in various sectors. For instance, improvements in sensors and increasingly practical wearable devices allow complex, automatic, and real-time health monitoring [41]. Such monitoring is beneficial by providing patients direct information on their current health status, facilitating diagnosis and treatment, and reducing costs of interventions and risks.
1.1 Cloud-backed IoT and Confidentiality
Due to limited storage and computation capacity available on IoT devices, the current de facto model for building IoT applications is to send the data gathered from physical devices to the cloud for both computation and storage (e.g., SmartThings,1 Nest2). Many IoT applications, therefore, leverage the cloud to compute on data streams from a large number of devices. For example, in smart health, healthcare providers can remotely monitor a larger number of patients, correlate data on a bigger scale, and detect abnormalities in health conditions at an early stage, which cannot be achieved with existing local infrastructure due to resource limitations. E.g., Yang et al. [64], propose a system that collects and displays real-time patient data and shows how using the cloud has alleviated issues of cross-platform deployment. Health service providers can also be disjoint from healthcare providers.
Due to the sheer amount of streaming data, building a private cloud infrastructure or expanding local infrastructure to support a large number of devices [36] is very expensive compared to using a low-cost public (untrusted) cloud infrastructure such as Amazon EC2 or Microsoft Azure. Therefore, public clouds are typically used for processing continuous queries including on sensitive data. Public clouds are also preferred because of the variety of software services they provide that make the development and deployment of corresponding applications very fast. However, this trend is fueling concerns over data confidentiality and is becoming one of the major factors preventing further widespread adoption of IoT solutions. In 2019, attacks leading to exposing user data or compromised accounts were experienced by 70% of organizations using public clouds [49]. These concerns represent a significant deterrent for industry domains like healthcare to adopt public clouds. If privacy concerns are addressed, then individuals may be more open to sharing their data, which is critical for contact tracing applications to help mitigate pandemics or epidemics [12].
One way to mitigate these concerns is to encrypt data at the source (i.e., IoT devices) and solely use cloud infrastructure for storage purposes (e.g., Bolt [25]). Thus, as long as encryption keys are maintained securely by consumers, the confidentiality of their data is enforced. While this approach addresses the above confidentiality concerns, all computations need to be performed in trusted environments, thus limiting the computational capabilities of public clouds for IoT solutions.
A promising approach to overcome this bottleneck is to use homomorphic encryption and execute all operations over encrypted data. However, fully homomorphic encryption (FHE) [20] causes significant slowdowns for complex computations despite continuous advances [21, 37].
An alternative, practical approach is to use less expensive partially homomorphic encryption (PHE) [50] in combination with property-preserving encryption (PPE) [44] to execute specific operations over encrypted data. Existing solutions based on PHE and PPE have mostly focused on database and batch processing systems. For instance, the seminal CryptDB [48] was implemented on top of the MySQL database, while Crypsis [59] was implemented in Apache Hadoop (Pig), and Cuttlefish [51, 53] and Symmetria [52] were implemented in Apache Spark. Such database-centric and batch processing solutions are not a good fit for many IoT applications that are implemented as continuous queries in a stream processing system. In addition, IoT devices can vary significantly in terms of their computing power and memory capacity. To enable devices that can be very resource-constrained (e.g., embedded devices using ARM Cortex-M3 can have as little as 72 MHz processing power and 64 KB memory) to encrypt sensitive data under various PHE and PPE schemes, these must be carefully implemented and optimized.
1.2 Challenges
A straightforward application of PHE and PPE to existing stream processing solutions to support computations over encrypted data is unlikely to be practical:
R1 | Complexity of cryptosystems. Dozens of PHE and PPE schemes exist, varying by operations supported, efficiency, ciphertext size, and so on; IoT application developers do not necessarily possess sufficient knowledge of cryptosystems to judiciously select among these. | ||||
R2 | Complexity of queries and data. Analytical continuous queries can become quite complex, leading to the intricate intertwining and combining of data items throughout lengthy sequences of processing stages. Tracking of data lineages becomes complex, yet it is necessary to determine which PHE or PPE schemes need to be applied to initial input data. | ||||
R3 | Application variables & constants. Variable initialization and constants in queries must be carefully handled to preserve the confidentiality requirements. | ||||
R4 | Inherent limitations. As hinted to by their names, PHE schemes do not support arbitrary operations. To overcome this limitation, the query processing can continue on the trusted client side after the intermediate results have been decrypted. Alternatively, the intermediate results can be re-encrypted to the schemes required by subsequent operations, after which computation can proceed in the cloud. | ||||
R5 | Cryptosystems on IoT devices. IoT devices are oftentimes resource-constrained with limited amount of processing power and memory capacities. The ability to support PHE and PPE cryptosystems on IoT devices in an efficient manner is a major concern. | ||||
R6 | Key compromises. With applications running continuously and potentially indefinitely, secret keys used for encryption need to be updated periodically or on-demand (e.g., when a device such as a health monitor is compromised). Such updates on IoT devices should be made transparently to the IoT application and should not cause disruptions to the execution of continuous queries or lead to missing results. | ||||
R7 | Resource management. Finally, processing continuous queries typically involves a pipeline of computing tasks, each of which may have one or more instances running concurrently. The deployment profile that maps task instances to virtual machines (VMs) in the cloud should make balanced use of resources to avoid bottlenecks. While some mapping heuristics are known, they do not consider encryption, which shifts bottlenecks, e.g., by altering computation/communication overhead ratios. | ||||
Fig. 1. C3PO design overview.
1.3 C3PO Overview and Roadmap
This article presents C3PO (
To perform analytics in the untrusted cloud over encrypted data while at the same time addressing 1-7, C3PO provides several novel features. After presenting background information on PHE, PPE, and continuous queries (Section 2) and giving an overview of our solution (Section 3) including the assumed threat model and architecture of C3PO, this article makes the following contributions through C3PO and its features as outlined below:
Programming abstractions (Section 4): We propose an abstraction of secure streams, embodied in the C3PO API for typical plaintext streams, to enable programmers to conveniently express confidentiality-preserving continuous query programs. C3PO automatically transforms the program to work with encrypted streams, executing it efficiently in the public cloud. Developers thus focus on the application logic and not on the details of the underlying cryptosystems (1), nor which specific cryptosystem to use for which part of the application (2) or how to handle variable initialization and constant encryption (3). C3PO is capable of continuing computation in the trusted tier or re-encrypting (parts of) a data stream to enable further computation in the public cloud if a given sequence of computations cannot be performed due to PHE limitations (4).
Encryption optimization techniques (Section 5): We introduce new PHE and PPE optimization techniques (e.g., field masking and speculative encryption) and adapt existing optimizations to the setting of IoT (e.g., pre-computation, ciphertext packing, and caching) to reduce encryption time and ciphertext size overhead and provide efficient implementations of these techniques so they can run on resource-constrained IoT devices (5).
Key management schemes (Section 6): We introduce key management schemes to support transparent periodic and on-demand rotation of secret keys on IoT devices and enable partitioning of data spaces (IoT groups) to reduce key sharing (6). By partitioning data in an application-aware manner and by supporting on-the-fly key changes without disrupting the execution of continuous queries, our key management schemes limit breaches in cases of key compromises without hampering practicality.
Deployment optimization technique (Section 7): We propose a deployment heuristic that analyzes resource availability and requirements and generates a deployment profile that optimizes cloud resource usage (7). The heuristic maximizes the amount of computation performed in the cloud when splitting computation between the untrusted cloud and a small number of trusted nodes (trusted tier) used to overcome the inherent limitations of PHE. In the perspective of deployment, we also analyze the security of C3PO.
Prototype implementation (Section 8): We present the C3PO system that implements our API and other features outlined above, building on the well-known Apache Storm4 system. C3PO analyzes programs written using the C3PO API and applies the above-mentioned heuristic after identifying computations that can be executed purely on encrypted data and computations that, due to the limitations of PHE, cannot.
Performance evaluation (Section 9): We evaluate C3PO on multiple benchmarks and case studies. Our results indicate that C3PO can be used to express many real-world IoT applications while ensuring confidentiality transparently and keeping a low overhead.
We contrast C3PO with related work in Section 10 and conclude with final remarks in Section 11.
The system described in this article supersedes our STYX system [61]. STYX promoted similar programming abstractions as C3PO but did not allow for limited key sharing in time (key rotations) or space (multiple groups). This article also makes PHE and PPE schemes applicable to resource-constrained IoT devices through the use of novel optimizations and presents additional empirical evaluation results, in particular with respect to encryption (and associated optimizations) on IoT devices and key management. A high-level perspective of STYX is presented in Reference [17].
2 BACKGROUND
In this section, we present background information on PHE, PPE, and the cryptosystems employed by C3PO. We then discuss relevant details of systems that support continuous queries.
2.1 Partially Homomorphic and Property-preserving Encryption
A cryptosystem is said to be homomorphic (with respect to certain operations) if it allows computations (consisting of such operations) on encrypted data. If and \(D(x)\) denote the encryption and decryption functions for input data \(x\), respectively (omitting keys for simplicity), then a cryptosystem is said to be homomorphic with respect to operation \(\phi\) if \(\exists\) operation \(\psi\) such that (1) \[\begin{equation} D(E(x_1) ~\psi ~ E(x_2)) = x_1 ~\phi ~ x_2 . \end{equation}\] For example, a cryptosystem is said to be an additive homomorphic encryption (AHE) scheme when \(\phi\) is addition “\(+\).” Similarly, a cryptosystem is said to be a multiplicative homomorphic encryption (MHE) scheme when \(\phi\) is multiplication “\(\times\).” Another category of cryptosystems that allows computations over encrypted data is property-preserving encryption (PPE). As the name suggests, PPE schemes preserve some property of the underlying plaintext that in turn can be used to perform operations over encrypted values. These operations include order comparisons “\(\lt\)” and “\(\gt\)” (order-preserving encryption (OPE)), equality comparison “==” (deterministic (DET) encryption) or text searches by applying “pattern matching” (searchable encryption (SRCH)).
Table 1 summarizes the cryptosystems used by C3PO and shows example operations for each of them. C3PO uses Paillier [43] as its AHE scheme to avoid high ciphertext expansion and avoid high decryption costs compared to, for example, the Goldwasser-Micali cryptosystem [23] that supports homomorphic addition on single bit inputs leading to higher ciphertext sizes and the Benaloh cryptosystem [5] that has a decryption time dependent on the security parameter, which makes decryption more expensive as that parameter increases. MHE schemes include the ElGamal [18] and unpadded RSA [50] cryptosystems. C3PO uses ElGamal for multiplications, as it is semantically secure, unlike unpadded RSA. Table 1 also shows that C3PO utilizes a set of PPE schemes that reveal uniqueness of encrypted values (DET) or ordering of encrypted values (OPE). In Section 3.1, we discuss the information leakage due to these encryption schemes and describe ways in which C3PO reduces this leakage.
| Cryptosystem | Property | Type | Security | Operations | Secondary operations |
|---|---|---|---|---|---|
| AES (CBC mode) | RND | – | Probabilistic | – | |
| AES (CMC mode [27]) | DET | PPE | Deterministic | \(==\) | – |
| FNR [15] | DET | PPE | Deterministic | \(==\) | – |
| Boldyreva et al. [7] | OPE | PPE | Deterministic | \(\lt\), \(\gt\) | – |
| Song et al. [58] | SRCH | PPE | Probabilistic | string match | – |
| ElGamal [18] | MHE | PHE | Probabilistic | \(\times\), \(\div\) | \(\times\), \(\wedge\) |
| Paillier [43] | AHE | PHE | Probabilistic | \(+\), \(-\) | \(+\), \(\times\) |
Table 1. C3PO Cryptosystems and Operations they Support over Encrypted Data
2.2 Secondary Homomorphic Operations
The operations supported by each cryptosystem as shown in Table 1 require that their operands are both encrypted under the same cryptosystem. In addition to these “primary” operations, some cryptosystems support “secondary” operations as long as one of the operands is in plaintext form (non-sensitive operand value). Consider, for example, the Paillier cryptosystem [43] with a public key \((g, N)\), where \(g\) is the generator and \(N\) is the modulus. Paillier primarily supports addition between two encrypted values, \(E(x_1)\) and \(E(x_2)\): (2) \[\begin{equation} D(E(x_1) \times E(x_2) \bmod N^2) = (x_1 + x_2) \bmod N \end{equation}\] Paillier also supports addition and multiplication between a ciphertext, \(E(x_1)\), and a (non-sensitive) plaintext, \(x_2\): (3) \[\begin{equation} D(E(x_1) \times g^{x_2} \bmod N^2) = (x_1 + x_2) \bmod N \end{equation}\] (4) \[\begin{equation} D(E(x_1)^{x_2} \bmod N^2) = (x_1 \times x_2) \bmod N \end{equation}\] Homomorphic subtraction can be achieved by performing an addition between the first operand and the additive inverse (by performing multiplication by \(-1\)) of the second operand: (5) \[\begin{equation} D(E(x_1) \times E(x_2)^{-1} \bmod N^2) = (x_1 - x_2) \bmod N \end{equation}\] Similarly, the ElGamal cryptosystem [18] supports multiplication and division (multiplication with the multiplicative inverse) between two encrypted values and multiplication/exponentiation between an encrypted and a plaintext value.
Paillier and ElGamal are defined in finite cyclic groups with a configurable plaintext space. Homomorphic operations are defined in these groups, and it is up to the application programmer to ensure that the associated plaintext spaces are large enough to accommodate the application needs and avoid overflows (due to either “primary” or “secondary” homomorphic operations). This is usually not a problem, since Paillier and ElGamal commonly use a plaintext space of up to 2,048 bits (or larger), which means they can encrypt plaintext values of up to 2,048 bits. This is because the security of these cryptosystems depends on problems that are hard to solve for large numbers, such as the decisional composite residuosity assumption for Paillier and the decisional Diffie-Hellman assumption for ElGamal. In contrast, continuous query applications usually work with integer values that are 32 or 64 bits. We note that, since homomorphic operations are defined in cyclic groups, homomorphic division works only when the dividend is divisible by the divisor. In cases where this is not the case, the division operation will succeed, but the result after decryption will be an incorrect very large (positive or negative) whole number. C3PO cannot identify what homomorphic division operations will fail a priori, but it can detect what division operations have failed after the results have been decrypted. Alternatively, C3PO can be configured to perform all divisions in the trusted tier that poses no restrictions on the division operands.
To make PHE and PPE schemes more suitable for IoT devices, we present extensions and optimizations applied to them in Section 5, where we also discuss how C3PO handles overflows while supporting negative numbers. We also give details about their implementation in Section 8.
2.3 Continuous Queries
The core abstractions offered by systems that support continuous queries are streams, tuples, and fields. Streams are unbounded sequences of tuples. Each tuple within a stream contains one or more fields with each field having an associated name. Values of each field can be accessed by dereferencing a tuple by the field name or the field index. Tuples in the same stream have the same set of fields (with distinct values). The tuples in the stream are processed in a distributed fashion. Application logic is arranged as a directed graph where vertices of the graph are computation components and edges are streams that represent the data flow between components. Application programmers write application logic for vertices of the graph. A subset of these vertices is also designated as source vertices. Source vertices act as entry points for data into the graph. Source vertices typically read data from a queue, log file, or external subscriptions. As data is generated in real-time and added to a queue, it is picked up by source vertices and forwarded down the graph for processing according to a grouping clause (described below).
Fig. 2. C3PO graph and tasks.
Figure 2 shows an example graph with four vertices with \(v_1\) designated as the source vertex. The figure also shows streams \(s_1-s_4\). Each vertex of the graph may have multiple runtime instantiations called tasks. Vertex \(v_1\) has one task running on public node \(n_1\) (a node represents a virtual or physical machine), \(v_2\) has two tasks running on public node \(n_1\), and \(v_3\) has three tasks running on public node \(n_2\). Finally, \(v_4\) has three tasks running on a trusted node \(n_3\). We refer to this assignment, i.e., the specific number of tasks per vertex, as the deployment profile of the graph.
The stream emitted by each vertex is declared explicitly in the vertex itself. Once all vertices of the graph are designated, the graph is assembled by defining the input stream of each vertex and specifying the grouping clause of each stream. A grouping clause defines how tuples in a stream are partitioned among the tasks of a vertex that receives the stream. This grouping clause is also part of the definition of the graph and is provided by the programmer. Common grouping clauses are:
(1) | shuffle grouping – tuples are distributed randomly across tasks in such a way that each task gets an equal number of tuples, | ||||
(2) | field grouping – tuples are partitioned according to a designated field and distributed among tasks, and | ||||
(3) | all grouping – the stream is replicated across all tasks. | ||||
3 C3PO Overview
In this section, we first introduce C3PO’s threat model and discuss some assumptions about the IoT devices used in C3PO. We then present C3PO’s programming abstractions and runtime execution flow.
3.1 Threat Model
The goal of C3PO is to preserve data confidentiality in the presence of a semi-honest adversary. We assume the adversary has read-only access to the cloud nodes and can observe data residing in the nodes, execution of applications, and any generated intermediate results. We assume that the adversary cannot make changes in the queries, results, or data stored in the cloud and consider integrity and availability attacks to be out of scope for our system. We also consider IoT device compromises out of the scope of our article and focus on preserving the confidentiality of data in the untrusted cloud. We later relax this requirement and show how encryption keys of IoT devices can be updated in case of a compromise (Section 6). We assume that C3PO has access to a limited set of trusted resources outside the cloud (e.g., where the query results are used). As we will see shortly, this environment is leveraged to perform a few specific computations.
C3PO utilizes a set of encryption schemes such as DET and OPE schemes that are deterministic (see Table 1) and are known to provide lower security guarantees. DET schemes reveal the uniqueness of encrypted values, since the same plaintext is encrypted into the same ciphertext, unlike probabilistic schemes that randomize ciphertexts. OPE schemes reveal the ordering of encrypted values and, in some instances, they have been shown to leak partial plaintext [22, 24, 39]. To reduce the use of DET and OPE schemes, C3PO issues a warning to the programmer when the application requires to use DET or OPE, giving the option to the programmer to either deploy the parts of the query that would otherwise require DET or OPE operations on the trusted resources at the expense of performance, or deploy the application as is, an option that could be viable if the data requiring DET or OPE holds semi-sensitive or high entropy information such as timestamps. C3PO could also benefit from using database indices such as ArxRange and ArxEq introduced in recent work [46] to perform range and equality queries, respectively, in a manner that preserves semantic security, but we have not yet incorporated these primitives into the current implementation of C3PO.
3.2 IoT Device Assumptions
IoT device classes. We assume that IoT devices used in C3PO are of the C2 class or higher (see RFC7228 [8], “Classes of Constrained Devices”) with at least 50 KB of RAM and CPU operating at a frequency of at least a few 10 s of MHz. In addition, we assume that IoT devices are in the E9 class of energy limitation (RFC7228 [8], “Classes of Energy Limitation”) with no direct quantitative limitations to available energy. We plan to incorporate battery-powered IoT devices with limited energy capacity and examine the effect encryption has on battery life as part of our future work.
Key sharing. Managing encryption keys for IoT devices in a distributed setting is a challenging problem [10, 65]. To distribute and manage keys, we use a public key infrastructure (e.g., Keybase [30], ZeroTier [54]), a standard assumption in multi-party systems. However, secret keys on devices can be compromised by an attacker having access to the device. Trusted hardware solutions available for various classes of IoT devices can be exploited to securely store secret keys [3]. Key sharing and storage is an independent active research topic, beyond the scope of this article; the focus of C3PO is to ensure the confidentiality of the data in the cloud. In C3PO, we assume that all IoT devices are owned by a single party (e.g., a healthcare provider issuing health monitoring IoT devices). IoT devices are connected to the Internet and are capable of establishing a secure, authenticated channel to the device owner (key manager), allowing keys to be updated using standard protocols (e.g., TLS [16]) following prior work [57].
3.3 Programming Abstractions
One of the main challenges of computing over encrypted data is that the application developer needs to have a detailed understanding of each cryptosystem used to encrypt fields of a stream. Adoption of PHE and even FHE for generic application development will depend on the ease with which a programmer can incorporate the properties offered by the cryptosystem into their regular programming tasks. C3PO tackles this problem by offering simple programming abstractions to express and operate on encrypted data streams. Fields representing sensitive data are defined using the
3.4 Execution Flow
Fig. 3. C3PO execution flow.
Figure 3 outlines the steps followed by C3PO to set up and deploy an application securely in an untrusted cloud. Application programmers use the C3PO API and associated annotations to describe a graph that contains the application logic. C3PO then performs homomorphism analysis on the graph to generate an encryption strategy, containing the cryptosystems required to execute the graph in a confidential manner. The encryption strategy is then passed to the key manager that generates keys for each cryptosystem, as described in Section 6, and sends them to the IoT devices using a secure channel. Next, C3PO analytically identifies the number of tasks required for different vertices and schedules the graph for execution. C3PO leverages the idea that oftentimes users have some limited (but trusted) computing resources available. We refer to these resources as the trusted tier. The compute resources in the cloud, though potentially unlimited for practical purposes, are untrusted. C3PO utilizes the trusted tier for application development and compilation and uses the cloud for the deployment phase. In deployments that require resources from the trusted tier, C3PO tries to minimize their usage. The deployment steps are detailed in Section 7.
4 C3PO Stream Processing
In this section, we describe the programming abstractions used in C3PO and explain how these abstractions are used in addressing challenges 1–4 and leveraged for improving performance.
4.1 Programming Model
Application programmers use the abstractions provided by our C3PO Java API to specify the C3PO graph. Each vertex in the graph is designed as a separate class by extending the
Abstractions. Next, we explain the abstractions that are new to C3PO over typical stream processing systems such as Storm:
Example.
List. 1. C3PO code for finding the sum of each group in a sliding window.
List 1 shows a code snippet used in a C3PO vertex class extending the
List. 2. Code for finding the sum of each group in a sliding window without C3PO abstractions.
List 2 shows just the function
(1) | know that Paillier is the correct cryptosystem to use for performing additions; | ||||
(2) | explicitly read the Paillier public key (Line 3) that contains the generator, \(g\), and the modulus, \(N\); and | ||||
(3) | perform the exact computation \(\psi\) (see Equation (1)) for homomorphic addition with Paillier—multiplication modulo the square of the modulus \(N\) of the public key (Line 11)—including handling of | ||||
These implementation complexities are not specific to summation and the Paillier cryptosystem. For example, ciphertexts of the ElGamal cryptosystem contain two components and homomorphic multiplication of two ciphertexts is achieved by multiplying the two components of the ciphertexts, respectively, to generate the encrypted result. Similarly, equality comparisons, order comparisons, and search over encrypted data operations require non-trivial computations over the ciphertexts. The C3PO API hides all these implementation complexities from the application programmer.
4.2 Processing Secure Streams
We now give details on how C3PO tackles the challenges 1–4 introduced in Section 1.2 when processing continuous queries over encrypted streams.
Identifying encryption schemes (1, 2). This step identifies the cryptosystems that are required for the various fields based on the operations that the application wishes to perform on those fields. To apply these inferences, C3PO first has to identify different streams and their grouping clauses in the application logic. These can be derived from the graph declaration provided by the application programmer, as explained in Section 4.1. Second, C3PO derives the operations performed on each stream from program annotations (
Once C3PO derives the distinct streams and operations to be performed on those streams, we can proceed similarly as in our prior work [60] to infer the cryptosystems required to execute the graph. In brief, we start by constructing an expression tree where fields in tuples form the leaf nodes. Operations performed on those fields form the non-leaf nodes. For C3PO graphs, we use field annotations (as specified in Section 4.1) to determine the non-leaf, operator nodes. For each operator, a lookup table identifies the cryptosystem of the operands and the result of the operator. Our goal now is to identify the cryptosystem in which all the leaf nodes (fields) should be encrypted. This can be done by identifying the parent operator node for each leaf node and using the lookup table to identify the type of cryptosystem required for operands for that operator node.
Handling public streams (2). An application might need to process plaintext data, such as publicly available stock quotes, from public streams as well as encrypted data. Public data can still be used in combination with private, encrypted streams to carry out useful computations. C3PO achieves this by allowing vertices to receive tuples containing both encrypted (
Field masking (1). Computing on encrypted data introduces the additional challenge of dealing with operands with increased sizes. For instance, an addition of two 32-bit
Initialization and constants (3). Oftentimes, application logic requires variables to be initialized to a specific value, say, \(\alpha\). To preserve confidentiality, value \(\alpha\) cannot remain in plaintext and should instead be encrypted under the appropriate cryptosystem during program compilation. To identify the appropriate cryptosystem for encrypting \(\alpha\), C3PO first identifies the operation and the
Automatic re-encryption (4). Once the analyzer determines the cryptosystems required for each stream, it may detect situations where some operations cannot be performed over the available cryptosystems in the cloud. This can occur if there is a mismatch between parent and child operator nodes, because they express operations not supported by the same cryptosystem. Instead, C3PO can either perform those operations in the trusted tier or re-encrypt the stream in the trusted tier. For example, conditions that require more than one encryption scheme to be used on the same variable, e.g., \(x + y \gt \alpha\) or for conditions that include a public value such as \(secret\_value \gt public\_const\) where the public value cannot be encrypted as it is already public, C3PO will perform the entire control structure in the trusted tier. For re-encryption, C3PO inserts special re-encryption vertices into the graph and marks them so they get scheduled on the trusted tier only.
5 PHE and PPE for IOT Devices
To ensure confidentiality, sensitive information needs to be encrypted at the source (IoT devices), before it is sent to the public cloud for processing. Encryption of PHE and PPE schemes is commonly computationally expensive, and a straightforward use of these schemes on IoT devices with limited resources is unlikely to be practical. In this section, we introduce a set of optimizations as well as extensions to previously proposed optimizations to reduce time and space overheads associated with PHE and PPE encryption, making these schemes more practical for use in resource-constrained devices, thereby addressing challenge 5.
5.1 Pseudorandom Number Pre-computation
Encryption functions of probabilistic (PHE) schemes such as Paillier and ElGamal encrypt values by first generating a large pseudorandom number (PRN) and then carrying out computations involving the pseudorandom number (PRN) and the plaintext value. Generating this PRN is oftentimes the most expensive operation of the encryption function, but luckily, PRNs can be generated independently of encryption requests and stored for later use. C3PO leverages this fact to improve the performance of encryption. IoT devices in C3PO pre-compute and store a small number of PRNs during times the devices are otherwise idle. When a new value needs to be encrypted, the encryption function first checks whether a yet unused PRN exists and if so uses it to complete the encryption request. Otherwise, the encryption function generates a fresh PRN. In applications with IoT devices taking measurements sparsely that have plenty of idle time in between measurements to pre-compute PRNs, PRN pre-computation has a drastic improvement on encryption performance.
5.2 Support for Negative Numbers
C3PO uses Paillier and ElGamal to carry out arithmetic operations homomorphically over encrypted data. By default, these cryptosystems, as well as existing PHE-based systems that use these cryptosystems [48, 59, 61], do not support operations that involve negative numbers. To add support for negative numbers in C3PO, we introduce alternative implementations for the encryption and decryption functions for both Paillier and ElGamal.
To achieve this, we leverage the fact that most homomorphic encryption schemes, including Paillier and ElGamal, operate on large plaintext and ciphertext spaces, as discussed in Section 2.2. C3PO uses Paillier and ElGamal with a 2,048-bit plaintext space and a 4,096-bit ciphertext space. In comparison, the actual message space required by applications is much smaller, e.g., 32 bits for
5.3 Ciphertext Packing
As mentioned above, the plaintext space of Paillier and ElGamal is larger than the message space needed to represent numbers in applications. For example, encrypting a 32-bit integer value under Paillier or ElGamal will produce a 4,096-bit ciphertext that has a \(128\times\) ciphertext size overhead. To reduce ciphertext size overhead, C3PO adapts a technique introduced by Ge et al. [19] to pack multiple plaintext values into a single ciphertext. Ciphertext packing works by concatenating multiple messages into a single plaintext value before encrypting. For example, values \(a_1, a_2, \ldots , a_n\) can be concatenated into \(a_1 \circ a_2 \circ \cdots \circ a_n\) before being encrypted, where \(\circ\) indicates bit-string concatenation. As homomorphic operations are performed on ciphertexts, the operations are carried out on the underlying packed values separately.
A potential issue when carrying out operations over ciphertexts that pack multiple values are overflows, where the result of one set of packed values overflows into the preceding one, which would lead to incorrect results. Ge et al. demonstrate packing for AHE schemes and solve overflows by using multiple groups and keeping partial sums per group, careful to only pack values in a way that cannot overflow. This approach works well in a database setting assumed by Ge et al. but does not work in a continuous query setting, because values are generated in real-time and cannot be known beforehand. Instead, in this work, we introduce another approach where before each packed value we include a series \(P\) of 0 bits so in case of overflow, the preceding value will not be affected: \(P\circ a_1 \circ P\circ a_2 \circ \cdots \circ P\circ a_n\). Furthermore, in the next few paragraphs, we demonstrate a novel way of ciphertext packing for AHE as well as MHE schemes and a method of packing values after they have been encrypted, which we call post-encryption packing.
AHE packing. Homomorphic addition can be carried out when ciphertexts contain packed values, because arithmetically \((a_1 \circ \cdots \circ a_n) + (b_1 \circ \cdots \circ b_n) = (a_1 + b_1) \circ \cdots \circ (a_n+ b_n)\). To avoid overflows, we calculate the total number of bits, \(T\), that need to be allocated for each packed item and the number of items that can be packed in a single plaintext, \(I\), before encrypting as follows:
(9) \[\begin{equation} T= P+ M=\lfloor log_2(R(2^{M}-1)) \rfloor + 1, \end{equation}\]
(10) \[\begin{equation} I= \Bigl \lfloor \frac{K}{T} \Bigr \rfloor . \end{equation}\]
In the above equations, \(P\) is the number of padding bits needed to capture overflows, \(M\) is the bit size of each message (e.g., \(M= 32\) for
MHE packing. C3PO also supports packing for MHE, but to a limited degree, because in multiplication each packed item of a ciphertext is multiplied with all packed items of the other ciphertext. We therefore fix the number of packed items to 2 and now, arithmetically, we have \((a_1 \circ a_2) \times (b_1 \circ b_2) = (a_1 \times b_1) \circ (a_1 \times b_2 + a2 \times b_1) \circ (a_2 \times b_2),\) which includes the intermediate term \((a_1\,\times \,b_2 + a2 \,\times \, b_1)\). By ignoring the intermediate term, we get the required \((a_1 \times b_1) \circ (a_2 \times b_2)\). Note that every multiplication generates an additional term in the ciphertext. To ignore these terms after decrypting, we extend the ciphertext of our MHE scheme to include a counter indicating how many multiplications have been performed to generate that ciphertext. When decrypting, this counter is used to identify how many intermediate terms need to be ignored to get the correct result. We compute the total number of bits, \(T\), allocated for each packed item as (11) \[\begin{equation} T= P+ M= \lfloor log_2(2^{M} - 1)^{R} \rfloor + 1 \approx MR \end{equation}\] where \((2^{M} - 1)^{R}\) is the maximum possible number after \(R\) items of \(M\) bits each are multiplied together. We calculate the number of tuples, \(R\), that can be aggregated as follows: (12) \[\begin{equation} R= \Bigl \lfloor \frac{K}{T} \Bigr \rfloor - 1 \Rightarrow R\approx \Bigl \lfloor \frac{\sqrt {M^2+4MK}-M}{2M} \Bigr \rfloor . \end{equation}\] In the above equation, \(-1\) accounts for the intermediate terms discussed above. By replacing \(T\) with the approximation \(MR\) and solving for the positive root of the quadratic equation, we get the final term. The above equations show that when packing 32-bit integers into a plaintext space of 2,048 bits a total of \(R= 7\) packed ciphertexts can be multiplied together before an overflow exceeds the padding bits. To continue performing multiplications after this, the packed ciphertext needs to be refreshed by being re-encrypted using a trusted node. This suggests that MHE packing is mostly useful in applications that need to perform multiplications infrequently or when there are frequent key changes (discussed in Section 6) as part of which the packed ciphertext can be refreshed. MHE packing does not support secondary operations.
Post-encryption packing. The packing techniques described above require multiple messages to be packed into a single plaintext before encrypted. Often it is not possible to pack messages
before they are encrypted, e.g., when the messages are generated over time and not available at the moment of encryption. It is still beneficial to pack values after they have been encrypted through post-encryption packing to reduce ciphertext size and decryption times. Post-encryption packing is particularly useful for continuous query applications that retain aggregated values over long periods of time, e.g., to keep track of daily, weekly, monthly, or yearly statistics. These aggregated values can be packed together as a single ciphertext value. We support post-encryption packing for AHE schemes. To pack multiple ciphertexts into a single one, we first (homomorphically) shift the ciphertexts appropriately and then add them up: (13) \[\begin{equation} c_p = \sum \limits _{i=0}^{n-1} c_i \otimes 2^{iT} . \end{equation}\]
In the above equation, \(\otimes\) denotes homomorphic multiplication between a ciphertext and a plaintext value (see Section 2.2), and \(T\) indicates the total number of bits required per packed item including padding bits, calculated as per Equation (9). \(n \le I\) is the number of ciphertexts to pack together where \(I\) is calculated using Equation (10), \(\sum\) denotes homomorphic summation, and \(c_p\) is the resulting packed ciphertext.
5.4 Caching and Speculative Encryption
Deterministic schemes used in C3PO such as DET and OPE produce the same ciphertext for a fixed plaintext value. C3PO leverages this fact by having IoT devices store a small number of encrypted values that are likely to be re-used. Specifically, C3PO uses a map that can hold a fixed number of plaintext to ciphertext key-value pairs and imposes an LRU policy to remove pairs once the map is full. The number of items that the map can hold is configurable and adjusted depending on the memory capacity of each IoT device. We extend the definitions of the encryption functions for DET and OPE to first search this map to see if the plaintext-to-ciphertext key-value pair exists, and if so return the corresponding ciphertext, otherwise encrypt the given plaintext and add it to the map.
To further reduce the encryption time overhead, C3PO uses speculative encryption by predicting what values will need to be encrypted next. C3PO encrypts and stores a small number of plaintext-ciphertext pairs proactively during times that IoT devices are idle. Speculative encryption is mostly useful in scenarios where the range of possible values is small, or for low entropy values, such as when measuring temperature in a closed environment.
5.5 Format-preserving Encryption
Oftentimes, continuous queries include equality comparisons involving values of a fixed format such as dates, timestamps, or phone numbers. To perform equality comparisons, these values need to be encrypted under a deterministic scheme. Naïvely encrypting fixed-format values under a deterministic scheme such as AES will result in a ciphertext that is at least the size of the block of the cryptosystem used. Instead, C3PO employs existing format-preserving encryption techniques via the use of the FNR [15] cryptosystem that generates \(n\)-bits of ciphertext for \(n\)-bits of input plaintext as long as the plaintext is smaller than 128 bits. For plaintext values larger than 128 bits, C3PO uses AES, as shown in Table 1. This optimization allows C3PO to reduce the ciphertext size overhead for values that need to be encrypted under a deterministic scheme. Keeping the ciphertext size overhead small leads to smaller end-to-end latency, because the data that needs to be transmitted from the IoT devices to the cloud nodes for processing is smaller. Furthermore, using FNR to keep ciphertext size smaller is particularly useful when also employing the caching and speculative encryption optimization described above. Since C3PO supports devices with as little as 64 KB of memory, having ciphertexts of smaller size allows IoT devices to retain a larger number of cached ciphertexts in memory.
6 IOT KEY MANAGEMENT
To reduce the risk of secret keys being compromised in continuous query applications, C3PO rotates keys periodically or on-demand without causing disruptions to query executions. In this section, we thereby address challenge 6 and describe key management in C3PO, with a particular focus on the replacement and sharing of keys.
6.1 Key Sharing
As mentioned in Section 3.2, we consider a setting where all IoT devices are owned by a single party and managed by a trusted key manager that distributes keys to IoT devices. To limit the sharing of keys across devices and over time, we employ the following techniques in C3PO:
Key rotation: update keys periodically (or in the event of a key compromise) without service disruption.
Multi-group mode: limit the number of devices that share an encryption key by splitting devices into multiple groups, each with a different key.
Field-level key identification: ensure fields that are not part of a common operation do not share encryption keys.
6.2 Key Rotation
A key challenge of using homomorphic encryption for IoT-based streams is that applications are often long-lived, which increases the chance of key compromises. To mitigate this, C3PO allows encryption keys to be rotated (updated) periodically or on-demand. Periodic key rotations could be part of a security policy. Further, if a key has been compromised, then an on-demand key rotation is initiated. Due to the nature of continuous queries and because homomorphic operations can only be carried out on operands encrypted with the same key, key rotations are not straightforward. Thus, we first start with an explanation of how keys are rotated in the general case and then present how to handle queries that involve aggregated values or sliding windows. Then, we present a method that helps reduce the number of keys that need to be rotated in case of a compromise.
General case. C3PO supports key rotations without disrupting the output. Usually, all IoT devices are considered to form a single logical group and share the same keys. A direct consequence of this is that every key leak will result in all devices needing to update their encryption keys, a problem that we address shortly with our multi-group mode. When a key change is initiated by the key manager, each IoT device first emits a key change marker that includes the new key identifier. When the Fig. 4. Key change in continuous queries. Streams flow from left to right (rightmost element, \( x_1 \), is oldest). After \( x_2 \) is emitted a key change is initiated.
Sliding window. The general case described above does not consider queries that contain computations involving older values received via the stream such as computations involving a sliding window. An example of such computation would be a query that computes the sum of the last few received items (or similarly, the sum of a certain time interval based on timestamps). In this case, results of computations that occurred under the previous key must be included in subsequent computations. C3PO follows the same process as before, and the stream encrypted with the new key is channeled into the new instance of the vertex class, but this time the stream encrypted with the old key continues processing uninterrupted. Instead, C3PO suppresses emissions from the new vertex instance until the new instance contains tuples spanning the full length of the sliding window. At this point, the old instance of the application vertex class is discarded and the stream from the new instance is emitted. Figure 4 illustrates this. Figure 4(a) shows the effect of a naïve key change in an encrypted data stream. Values are first encrypted with key \(k_1\) and then \(k_2\). Aggregations with sliding windows that span values encrypted with both keys will fail. In Figure 4(b), when a key change is initiated, a specific number of values (equal to the size of the aggregation window) are encrypted with both keys \(k_1\) and \(k_2\). This allows aggregations to preserve semantics, i.e., avoid any disruption in output.
This solution works well as long as the sliding window is small. For large sliding windows, or for queries with aggregation functions that span the entire duration of the query, the above solution becomes inefficient, since for every key change, and as long as the sliding window does not end, another stream is added. Instead, C3PO handles this case by using the trusted tier to re-encrypt the current aggregated result under the new key. After the re-encryption step, the query can correctly handle tuples encrypted with new keys.
6.3 Multi-group Mode
To reduce the surface of affected devices in the event of a key compromise, C3PO introduces the multi-group mode. In this mode, IoT devices are grouped into logical subsets and a different key is assigned to each set (where otherwise the key would be the same). IoT devices can be grouped together based on any user-defined criteria. Devices that are behind the same gateway usually make a good grouping. This allows us to rotate the encryption keys of devices behind a specific gateway independent of devices outside the gateway.
Operations across groups. At the processing end, multiple groups lead to additional tasks for
List. 3. C3PO combiner implementation for aggregation in multi-group mode.
Key rotation. So far the discussion about key rotations assumed a single logical group containing all IoT devices. In this setting, in case of a key compromise, all devices need to replace the compromised key with a new one. The multi-group mode allows us to initiate key rotations for any individual group. List 3 shows a
6.4 Field-level Key Identification
For operations to be performed over encrypted data, fields involved in the same operation must be encrypted using the same key. Inversely, fields not involved in the same operation should use a different key to prevent leaking relations between fields unnecessarily and to minimize the impact of compromised keys. E.g., to perform the operation \(x_1+x_2\) both \(x_1\) and \(x_2\) must be encrypted with the same AHE scheme and using the same key, or the operation will generate a wrong result. Separately, if we also need to perform the operation \(x_3+x_4\), then \(x_1\) and \(x_2\) need to be encrypted under the same key, but \(x_3\) and \(x_4\) can be encrypted under a different key even though all four fields need to be encrypted under an AHE scheme to carry out the addition. We capture these field groupings by assigning fields into “field families” that indicate which fields are involved in the same operations, directly or indirectly. Following this intuition, we derive two invariants that need to hold for all keys to minimize data leaks while preserving program correctness.
I1. Correctness | Fields involved in the same kind of operations and belonging to the same field family need to be encrypted with the same key. | ||||
I2. Security | Fields involved in different kind of operations or (either or) belong to a different field family need to be encrypted with different keys. | ||||
Fig. 5. C3PO key management.
6.5 Key Generation
The key manager uses key group information and the encryption strategy generated during the homomorphism analysis step of the compilation to decide how to generate keys, as shown in Figure 5. It then associates a key to each field in a manner that satisfies invariants I1 and I2 introduced above. More specifically, keys are generated based on the equation given below: (14) \[\begin{equation} K_{c,f,g} = \textrm {PRP}_{\textrm {MK}}(c, f, g), \end{equation}\] where PRP is a pseudo-random permutation (e.g., AES block cipher) and MK is the master key, known only to the key manager, from which all other keys are derived. Furthermore:
Cryptosystem c: c indicates the cryptosystem the key is used for. Since the cryptosystem used indicates the operation that can be performed over encrypted data, by invariant I2, different cryptosystems should lead to different keys.
Field family f: f indicates the field family of fields. By invariant I1 fields of the same field family need to be encrypted under the same key. For different field families, C3PO uses a different f, which will generate a different key, even if the cryptosystem is the same.
Group g: g captures the group identifier to support multi-group mode. In the general case (single-group mode), there is simply a single all-encompassing group.
Once the keys are generated, the key manager opens a secure channel with each IoT device and sends only the keys each device requires, according to what fields each device generates and what group it belongs to. To keep track of what keys are sent to each device and to be able to identify which devices need to be sent new keys during key rotation, the key manager keeps a map of key IDs per device (key metadata).
7 C3PO Deployment and Security Analysis
Next, we describe how graphs are deployed in the C3PO runtime (challenge 7) and discuss C3PO’s security properties.
7.1 Deployment Profile Generation
As defined in Section 2, the number of runtime tasks assigned for each vertex in the graph is called the deployment profile of the graph. A good deployment profile is required to avoid bottlenecks and ensure good resource utilization.
Utilization. To reason about the effectiveness of deployment profiles, we define utilization of a vertex for a time interval as the amount of time the vertex spends processing during that time interval. For instance, if a vertex spends 5 minutes in a 10 minute interval processing tuples and the rest of the time waiting for tuples to arrive, then it has a utilization of 0.5. As the utilization of a vertex approaches 1, we can assume it is starting to become a bottleneck. Good resource utilization is usually achieved by the programmer explicitly specifying the number of tasks for each computation vertex. Programmers are perfectly suited to do this, as they understand, via application logic, which vertices handle more data or computation and can correspondingly allocate more tasks for those vertices. In C3PO, when the computation graph is transformed and operations are converted to their cryptographic equivalents, the utilization of a vertex changes substantially. This means that programmers need to thoroughly understand the overheads of each cryptosystem, which goes against C3PO’s design goals.
Fig. 6. C3PO deployment heuristic.
Heuristic. We propose a linear programming-based heuristic that automatically converts the deployment profile for a plaintext graph into an optimized deployment profile for the corresponding C3PO graph. Figure 6 shows the formal representation of the heuristic that we use. S represents the slots available for instances to use and V represents the vertices in the graph that need to be allocated. A slot is typically a Java virtual machine (JVM) or an executor thread within a JVM. We assume all slots have the same processing capacity. Matrix A represents how much each vertex amplifies its input. A is derived by executing the plaintext graph on sample data. To compute the amount of data arriving at a vertex, we consider all paths of varying lengths that end up at that vertex from the source. Matrix A gives the amount of data at vertices that are one edge away from the source (for unit input). To find data arriving at the vertex i (represented by \(d_i\)) through paths of length 2 and higher, we compute the power matrix of A represented in Figure 6 as \(A^2\), \(A^3\), and so on. Vector C represents the load on each vertex relative to one another. C is derived by inverting the number of instances for each vertex (from the deployment profile) in the plaintext version of the graph and then scaling it with respect to the crypto operations performed by the vertex.
For example, assume that a programmer specifies the number of instances for each vertex as \(v_1:1, v_2:3, v_3:2, v_4:6\) for the plaintext graph presented in Figure 2. We create a vector of these values and invert them to get the vector \([1, 1/3,1/2,1/6]\), normalized to \([6, 2, 3, 1]\). The intuition here is that for vertices that come under heavy load, the programmer will allocate a higher number of instances in the deployment profile to accommodate the load. At the next step, we scale down the value of each element in the above vector based on a reduction factor. This reduction factor is derived empirically based on our observations. We use a reduction factor of 6 for re-encryption nodes, 3 for AHE and MHE schemes, and 2 for all other crypto operations. Consequently, and based on Figure 2, if \(v_3\) receives a stream (\(s_2\)) with a field encrypted under AHE, then we scale down the value corresponding to \(v_3\) to 1, after which, we get \(C=[6, 2, 1, 1]\), and repeat for other vertices.
In Figure 6, T represents the deployment profile, and \(t_i\) represents the number of slots allocated to execute vertex \(v_i\). Our target now is to derive each \(t_i\). We define two sets of constraints. The first set of constraints ensures each vertex is allocated to at least one slot. The second set of constraints ensures that for all vertices, the load, c, is less than the capacity of the nodes to process it. Under these constraints, we maximize the amount of data that can be consumed at the source vertex.
7.2 C3PO Scheduler
The primary responsibility of the C3PO scheduler is to decide on which host machine(s) each vertex of the graph will be executed. The C3PO scheduler is provided with two lists of hostnames, one that lists hosts in the untrusted cloud, and another that lists hosts in the trusted tier. The scheduler reads the graph annotation to identify where each vertex must be executed.
For components that need to be executed in the trusted tier, the scheduler sends the appropriate class files to workers running in the trusted tier. Trusted tier workers have access to the secret keys required for encryption/decryption. The workers in the untrusted cloud can only access encrypted data and only have access to the public keys required to perform the homomorphic operations.
We note that the scheduler service can be deployed in the untrusted cloud. An attacker can try to manipulate the scheduler in the following ways:
(a) | attempt to execute trusted vertices in the untrusted cloud, and | ||||
(b) | attempt to execute untrusted code in the trusted tier. | ||||
Attack (a) does not compromise confidentiality, since the untrusted cloud does not possess the secret keys required to reveal the plaintext data. However, attack (b) can compromise confidentiality if the attacker is successful in executing malicious code that retrieves secret keys or reads data when they are in plaintext while being re-encrypted. To avoid this, a hash of the vertices to be executed in the trusted tier is generated before deployment. When tasks are delivered to the trusted tier for execution, the trusted tier first computes a hash of the task class and compares it with the hash generated before deployment. Execution proceeds only if the hash is verified.
7.3 Security Analysis
In this section, we analyze threats across various system components of C3PO such as IoT devices, cloud nodes, trusted tier, and the network and describe how C3PO addresses these threats.
Threat 1: Cloud compromises. The main security objective of C3PO is to preserve the confidentiality of data at rest and data in use in the presence of a semi-honest adversary. The adversary is expected to have read-only access to the data in persistent storage and the main memory of the cloud nodes. C3PO defends against this attack by never revealing secret keys or plaintext values of sensitive data to the cloud. Through the use of PHE and PPE, sensitive data remains encrypted both when stored and when being used. Furthermore, through the use of secondary homomorphic operations and associated optimizations (Section 5), C3PO allows computations between sensitive and non-sensitive data without revealing information about the sensitive input values or the output values. As discussed in Section 3.1, C3PO can optionally use encryption schemes that reveal relationships among data items. The use of such encryption schemes can be reduced by making use of the trusted tier and through the use of alternative encryption schemes discussed in Section 10.1. Even though C3PO is primarily concerned with a passive attacker, it also prevents active attackers from executing malicious code in the trusted tier by verifying the hash of code deployed in the trusted tier (Section 7.2).
Threat 2: Data in transit attacks. A passive attacker can attempt to issue a network snooping attack to extract keys and sensitive data while they are being transmitted over the network. To ensure the secure delivery of keys, C3PO first establishes a secure TLS-based connection between the key manager and IoT devices and transmits keys over TLS. By default, C3PO sends PHE- and PPE- encrypted data with TLS disabled, since the data is already encrypted. To improve the confidentiality guarantees for data encrypted under PPE schemes, or to prevent integrity attacks in case of an active network attacker, C3PO can be configured to send data through a TLS channel. This means that the data is first encrypted under an appropriate PHE or PPE scheme, and in addition encrypted using a probabilistic authenticated encryption scheme such as AES-GCM, ensuring the confidentiality and integrity of transmitted data. Once the data is received by the cloud nodes, the outer layer of encryption is removed, leaving the data encrypted under a PHE or PPE scheme so the data can be homomorphically analyzed.
Threat 3: IoT device compromises. IoT devices are vulnerable to physical attacks that can compromise secret keys (for asymmetric encryption schemes only the public key is made available to the IoT devices). A subset of these keys is shared between multiple IoT devices, since this is a requirement for homomorphic operation correctness. C3PO does not prevent IoT key compromises altogether but reduces the effect of such key compromises. In particular, C3PO introduces two key invariants and field-level key identification (Section 6.4), which limit the number of keys that need to be shared across IoT devices. Field-level key identification also limits the number of fields encrypted under the same key while maximizing the range of homomorphic operations that can be performed across fields of data. To further reduce the effect of key compromises, C3PO introduces a multi-group mode (Section 6.3) that limits the number of devices that share common keys even further and allows frequent key rotations while minimizing service disruptions.
8 IMPLEMENTATION
In this section, we lay out some of the implementation details of C3PO.
8.1 Storm Integration
C3PO’s processes in the cloud are implemented by modifying Apache Storm. Storm is an online, distributed computation system. Application logic in Storm is packaged into directed graphs called topologies. Vertices of the topologies are computation components and edges represent data flow between components. There are two types of components in Storm:
(1) | spouts, which act as event generators, and | ||||
(2) | bolts, which capture the program logic. | ||||
In other words, spouts produce the data streams upon which the bolts operate. Modifications to Storm are limited to implementing a new scheduler by overriding the
8.2 Cryptosystems
Our cryptosystems including the extensions and optimizations described in Section 5 are implemented in C5 and accessed where necessary through the Java native interface (JNI). Randomized encryption (RND) is implemented using the AES [13] cryptosystem in CBC mode with a random initialization vector. Deterministic encryption (DET) is implemented using an AES pseudo-random permutation block cipher with a variant of CMC mode [27] with a zero initialization vector. FNR [15] is used as an alternative DET cryptosystem to preserve the format of small values. The Boldyreva et al. [7] cryptosystem is used as our OPE scheme implementation and Song et al. [58]’s cryptosystem as our SRCH scheme.
We implemented the Paillier [43] cryptosystem as our AHE scheme and followed the approach of Damgård and Jurik [14] to set the generator \(g = N + 1\) for a more optimized implementation of encryption. We also used the Chinese Remainder Theorem to optimize the decryption function of Paillier. Finally, we implemented ElGamal [18] as the MHE scheme. Paillier and ElGamal require arbitrary precision arithmetic computations as part of their encryption, decryption, and homomorphic operations. We implemented three different versions of Paillier and ElGamal, each using a different arbitrary precision arithmetic library, since not all these libraries are supported on all IoT devices. We use the GMP library [62] (version 6.1.2) and its
9 EVALUATION
We evaluated C3PO using standard benchmarks and use cases. Our evaluation shows that C3PO can preserve confidentiality by executing on encrypted data with 20%–30% higher latency and around \(23\%\) reduction in throughput. We use several scenarios for evaluation as follows:
Encryption latency: We assess the feasibility of using C3PO with resource-constrained devices by analyzing the encryption latency of various cryptosystems and associated optimizations used by C3PO on IoT devices of the aforementioned C2 or higher classes (see Section 3.2).
Smart meter analytics: We use the Smart* dataset [4] as our input together with queries adapted from IoTBench [2] to compare the throughput of C3PO to vanilla Storm. In this scenario, the volume of processed data is of primary concern. This includes an assessment of field masking.
Heartbeat analysis: We use a heartbeat analysis application that computes individual and group statistics. We use this application to evaluate the latency of C3PO; query response times are critical in such healthcare applications for triggering emergency responses in a timely manner. This includes an assessment of PRN pre-computation and post-encryption packing.
Yahoo streaming: We also use the more generic Yahoo Streaming Benchmark (YSB) [11] for further evaluating the latency of C3PO. Latency is critical in this benchmark, as the goal is to react quickly to advertisements.
Linear road: We use the Linear Road Benchmark (LRB) [1] stream processing benchmark, which requires re-encryption to assess the effectiveness of our deployment heuristic, analyzing throughput, latency, as well as resource utilization, which is key to efficient deployment.
Multiple groups: We use a microbenchmark to analyze the effects of C3PO’s multi-group feature.
New York taxi statistics: Finally, we evaluate the costs of re-keying by computing statistics over a large number of nodes (devices) based on a publicly available data-set [63] from New York taxis released under FOIL (Freedom of Information Law).
9.1 Encryption Latency
To evaluate the feasibility of our approach from the point of view of the end devices, we consider the encryption latency of various cryptosystems on different IoT devices. We use five devices with different computational/memory capacities:
2xl: | Amazon AWS | ||||
Pi3: | Raspberry Pi 3 Model B with Quad Core 1.2 GHz Broadcom BCM2837 CPU and 1 GB RAM. | ||||
Pi0: | Raspberry Pi Zero W with a 1 GHz 32 bit single-core CPU and 512 MB RAM. | ||||
A8: | ARM Cortex-A8 with 600 MHz 32-bit microprocessor and 256 MB RAM. | ||||
M3: | ARM Cortex-M3 with a 72 MHz 32-bit microprocessor and 64 KB RAM. | ||||
We evaluate two PPE schemes (AES and FNR) and two PHE schemes (ElGamal and Paillier), each implemented under different libraries, as explained in Section 8:
(1) | the NTV native implementation of AES,6 which is also used internally in the FNR cryptosystem; | ||||
(2) | the OpenSSL library; | ||||
(3) | the GMP library; | ||||
(4) | the BDS BigDigits library. | ||||
We use a 128-bit block for AES and a 2,048-bit modulus for Paillier and ElGamal.
Fig. 7. Encryption latency of various PPE (time in microseconds) and PHE (time in milliseconds) schemes across different IoT devices. y-axis in log scale.
Figure 7 shows the execution time for encrypting a random 128-bit string for AES and FNR and a random 32-bit Fig. 8. Encryption latency of ElGamal and Paillier with PRN pre-computation across different IoT devices. y-axis in log scale. Fig. 9. Encryption latency of ElGamal and Paillier with ciphertext packing across different IoT devices. y-axis in log scale.
As shown in Figures 8 and 9, our proposed optimizations improve the performance of ElGamal and Paillier across all IoT devices dramatically. With PRN pre-computation, ElGamal encryption implemented in GMP is slightly faster than SSL and takes only 0.4 us on a 2xl device, 5 us on Pi3, 18 us on Pi0, and 18 us on A8. Even in the worst case, ElGamal implemented using the less optimized BigDigits library takes only 1.7 ms on the most resource-constrained device, M3. PRN pre-computation has similar benefits for Paillier with the worst case of implementing Paillier using BigDigits taking 41.6 ms on an M3 device. The ciphertext packing optimization allows us to pack two
9.2 Smart Meter Analytics
In this evaluation, we study the throughput of C3PO by running a set of analytical queries to analyze the electricity usage of homes. We use the Smart* dataset [4] as our input. This dataset represents electrical meter readings collected over a 24-hour period at the rate of one reading per minute from 443 unique homes, totaling 637,526 records. Each reading is a tuple of three fields: Fig. 10. Smart meter analytics throughput. Fig. 11. Heartbeat analysis response time.
We used a time window of 60 s and executed the queries for at least 600 s. We ran these queries on four m3.large nodes on Amazon EC2. For C3PO, one of the four nodes was specified as a trusted tier node. The bandwidth of the trusted node was throttled to 8 Mbit/s to simulate a wide area network link. The results of our evaluation are presented in Figure 10.
9.3 Heartbeat Analysis
Next, we study how C3PO can be used for an online healthcare application like a heartbeat monitor. The end-user application runs on specialized hardware (the monitoring IoT device) and counts the number of heartbeats per minute. The monitoring device uses PRN pre-computation to efficiently encrypt this value and send it to the cloud for processing and storing. PRN pre-computation allows us to use IoT devices with as little computational capabilities as the M3 node described above. The graph running in the cloud keeps track of daily, weekly, monthly, and yearly statistics. We use post-encryption packing to pack these four values into a single ciphertext, thereby reducing the ciphertext size by 4. The end-user may request to see these statistics on their device, in which case the data is retrieved from the cloud, decrypted on the monitoring device, and shown to the end-user. The statistics are maintained by two vertices, a “per user” vertex (\(v_1\)) and an “all users” vertex (\(v_2\)). User statistics are distributed across the multiple instances of \(v_1\). \(v_1\) also emits a summary of its per-user statistics every minute that is grouped by week, month, or year by \(v_2\) to find the average value across all users. The client device emits a message every time the client requests to see a specific data point, in response to which, the requested values are retrieved. For this application, the most critical metric is the response time, i.e., the time a user has to wait after requesting to see a metric until the metric is displayed. We deploy \(v_1\) and \(v_2\) on three m3.medium nodes in EC2 and use a single end-user device deployed on an A8 device. We measure the response time as we increase the volume of incoming tuples to \(v_1\) and \(v_2\) by simulating additional end-user devices using an Apache Kafka queue. Each of these end-user devices, including the one deployed on the A8 node, emits one tuple per minute containing an encrypted timestamp and the encrypted number of heartbeats. The results of this evaluation are presented in Figure 11, where we compare the response time of C3PO, C3PO-PP, which denotes C3PO with post-encryption packing disabled, and Storm. The top part of each stacked column indicates decryption overhead. We observe that response times for C3PO when excluding decryption time are very close to the plaintext version for up to 1,500 client devices after which C3PO’s response time degrades due to the increased load of \(v_1\) and \(v_2\) compared to the plaintext stream running on Storm. Decryption is a significant source of overhead. In C3PO without packing, the end-user device receives and decrypts four 4,096-bit Paillier ciphertexts containing the daily, weekly, monthly, and yearly statistics. Packing reduces that to a single ciphertext, which leads to lower network overhead and significantly lower decryption time (~\(4\times\) lower).
9.4 Yahoo Streaming
Fig. 12. YSB latency.
We use the Yahoo Streaming Benchmark (YSB) to study the latency of C3PO. This benchmark simulates an advertising analytics use case with several advertising campaigns, each containing several advertisements. The benchmark reads various events from Apache Kafka, identifies the events relevant to the advertisement campaign, and stores a windowed count of relevant events per campaign. The steps in this analytical processing pipeline are as follows:
(1) | Read an event (in | ||||
(2) | Deserialize the | ||||
(3) | Filter out irrelevant events, based on the event type. | ||||
(4) | Take a projection of the relevant fields keeping the ad ID and the event time. | ||||
(5) | Join each event with its associated campaign. This ad-to-campaign mapping information is stored in a Redis in-memory data store. | ||||
(6) | Take a windowed count of events per campaign and store each window in Redis along with a timestamp of the time the window was last updated in Redis. | ||||
The input data for this evaluation is generated using a
9.5 Linear Road
We use the popular Linear Road Benchmark (LRB) that models variable toll calculation for a city or county to assess C3PO’s deployment heuristic. LRB simulates vehicles traveling through an expressway with vehicles generating position reports at fixed time intervals. Position reports contain information such as the expressway ID, direction of travel, lane of travel, mile marker, offset within the mile, and so on. These position reports are processed by a toll levying agency to dynamically:
(1) | calculate the amount of toll to be levied on the vehicle and | ||||
(2) | identify accident locations to alert vehicles upstream of the accident. | ||||
LRB also specifies latency invariants such as the time within which a toll must be calculated and the time within which an accident has to be identified. The upper limit within which the system needs to report tolls and accidents is 5 s. The benchmark rates the system by the highest number of expressways (L) the system can support while maintaining these invariants. Figure 13(a) shows the Storm topology that implements the standard linear road, and Figure 13(b) shows the transformed C3PO topology. The latter topology contains two new vertices, \(v_6\) and \(v_7\), which are re-encryption vertices that are executed within the trusted tier. We ran the experiment for three hours. The rate at which position reports are emitted for one single expressway is shown in Figure 14. To test the system under both low and high loads, the rate of input is designed to steadily increase up to 1,811 tuples/s.
Fig. 13. LRB graph. Shaded nodes represent re-encryption vertices that are executed within the trusted tier. Fig. 14. LRB data profile. Fig. 15. Storm LRB baseline. Fig. 16. C3PO LRB baseline.
LRB baseline and hypothesis validation. We first ran a baseline deployment of LRB by assigning each vertex a single task. This allows us to observe each individual vertex to see how they consume resources and verify the hypothesis made in Section 7.1 that bottlenecks change when running on encrypted data streams. We plot utilization as defined in Section 7 against time in Figure 15 for Storm with plaintext data and in Figure 16 for C3PO with encrypted data. We observe that in Storm vertices \(v_4\) and \(v_2\) have the highest utilization values until around the 8,000 s mark, and after that vertex \(v_1\) becomes the node with the highest load. This increase is because the number of tuples that require a toll notification increases substantially after 8,000 s. In the transformed C3PO graph running on encrypted streams, \(v_5\) and \(v_1\) come under high load until 8,000 s, and after that the bottleneck at \(v_1\) becomes yet more prominent. This validates our hypothesis that primary bottlenecks differ between graphs running on plaintext vs. encrypted streams.
| System | Deployment profile | Average response time (ms) | # of expressways supported |
| Storm | 5, 4, 1, 3, 2 | 2,694.44 | 20 |
| C3PO | 5, 2, 1, 2, 3, 1, 1 | 2,672.97 | 15 |
Table 2. LRB Comparison
Performance of C3PO deployment profile. Next, we benchmark both the Storm topology graph and the transformed C3PO graph for LRB. We deploy both graphs on 15 m3.large nodes in Amazon EC2 using the best possible configuration so the maximum number of highways supported can be identified. We show the results in Table 2. For plaintext streams Storm supports 20 expressways, while C3PO with encrypted streams supports 15 expressways. We also plot response times for all notification triggering tuples—times taken for notifications to be issued from the time respective tuples enter the system. The response times are shown in Figure 17 for Storm and in Figure 18 for C3PO. Response times for C3PO peak faster than Storm, but for 15 expressways C3PO is able to maintain the response time below the threshold allowed by the benchmark.
Fig. 17. Response time for LRB on Storm. Fig. 18. Response time for LRB on C3PO.
Effectiveness of analytical model. The effectiveness of the model can be evaluated by looking at how well the model converts the deployment profile for the plaintext streams to the deployment profile for the encrypted streams in C3PO. Vertices with higher utilization values should get more instances to execute them. Table 3 shows the response time of C3PO deployment profile compared to other deployments. As can be seen, the deployment profile generated by C3PO results in the lowest response time. This profile is also in accordance with Figure 16, which shows vertices \(v_1\) and \(v_4\) should get the highest numbers of instances.
9.6 Multiple Groups
In this evaluation, we look at the effectiveness of using multiple key groups as outlined in Section 6. We evaluate the throughput (tuples/s) of a
Fig. 19. Effect of varying the number of key groups.
9.7 New York Taxi Statistics
The New York taxi statistics application finds the 10 most frequent routes during the last 30 minutes of taxi servicing. A route is represented by a starting grid cell and an ending grid cell. The data for this application is based on a publicly available taxi dataset released under FOIL (Freedom of Information Law). The input data contains the locations (latitude and longitude) of passenger pick-ups and drop-offs, MD5 digests of the medallions of the taxis that picked up the passengers, and the trip times. The dataset contains records that span over a year. The application emits an output tuple whenever there is a change in the top 10 values. We define response time as follows: Given a tuple t that causes the top-10 values to change from tuple \(top10_{t-1}\) to \(top10_t\), and a function \(T(x)\) that gives us the time at which tuple x is emitted, response time is \(T(top10_t) - T(t)\) To evaluate the effect of key changes on response time, we simulate a key change at the beginning of each month. This means that all data is emitted with a timestamp within the first 30 minutes of every month will be encrypted under both the old and new keys. We deployed this application on 10 m3.large nodes in Amazon EC2. Table 4 summarizes the results of these runs. We can see that C3PO with no key changes completes processing the data with only a 23.8% and 25.1% increase of completion time and average response time, respectively, compared to the Storm running on a plaintext stream. Furthermore, the increase in completion times and average response times caused by a monthly key change are minimal (about 1%). Figure 20 shows the response times for the full 10,000 s run to process a year-long data with key changes in input data every month. In this plot, we can see intermittent spikes (total of 12) in response time for some tuples around the time a key change is in progress, but the majority of tuples (90th percentile within 31 ms and 99th percentile within 818 ms) respond with the same response time as when no change was in effect.
Fig. 20. Response time of top-10 taxi route query with monthly key change.
10 RELATED WORK
The advent of cloud computing has led to the need for highly scalable stream processing systems and has resulted in the next generation of stream processing systems such as Storm, Heron [34], Spark streaming [66], and Samza [40]. C3PO’s design is based on Storm, but our proposed concepts for preserving data confidentiality in the context of continuous query execution can be applied to other stream processing systems. In what follows, we overview related work in the area of IoT-based confidentiality-preserving stream processing.
10.1 Computing over Encrypted Data
In his seminal work, Gentry introduced an implementable FHE scheme [20] that has been becoming more practical since References [21, 37], but is still not suited for encryption-enabled continuous query processing due to its prohibitive cost. Instead, C3PO follows the approach of several related research works, focusing on using PHE and PPE schemes to perform computations over encrypted data. The OPE scheme used in C3PO is based on Boldyreva et al. [7]’s construction, the implementation of which is openly available. Newer OPE constructions exist that offer ideal security [47] and defend against ciphertext frequency analysis [32, 33]. C3PO’s current OPE scheme can be replaced with the implementation of the aforementioned schemes to support order comparisons. Similarly, the SWP [58] scheme C3PO uses for searching over encrypted data can be replaced with other constructions [26] that allow searching over encrypted data.
10.2 PHE-based Systems
CryptDB [48] is a database system focusing on executing SQL queries on encrypted data using PHE. CryptDB uses a proxy to intercept client queries and transform them into queries that operate over encrypted data. Crypsis [59, 60] is a runtime system built on Apache Pig, which analyzes and transforms data flow graphs to generate semantically equivalent graphs that are deployed in public clouds and executed over encrypted data. SecureScala [28] is a domain-specific language in Scala that allows expressing secure programs without requiring any cryptographic knowledge from the programmer. Cuttlefish [53] is another recent system that uses PHE. Cuttlefish is built on Spark and introduces Secure Data Types which allows an application programmer to specify intrinsic properties about the structure and constraints of data, which in turn enable a set of compilation techniques, making Cuttlefish generate more optimized queries. Seabed [45] introduces an additively symmetric homomorphic encryption scheme to perform aggregations on large encrypted datasets efficiently. Symmetria [52] is a PHE-based system built on top of Spark that introduces two symmetric PHE schemes, replacing the more computationally expensive asymmetric PHE schemes. JEDI [35], an end-to-end encryption scheme leverages the hierarchical resource structure of IoT systems to delegate keys in a decentralized manner across multi-hops. WKD-IBE and AES schemes and assembly-level optimizations are incorporated to support embedded IoT devices. Talos [57] builds on the capabilities of CryptDB and introduces cryptographic primitives that work on low-power devices. Pilatus [56] introduces an encrypted data-sharing scheme based on re- encryption, with revocation capabilities and in situ key updates. TimeCrypt [9] proposes an efficient encryption scheme based on additive symmetric homomorphic encryption for time series data by mapping the keys to time. Droplet [55] proposes a decentralized access control mechanism to access encrypted data present on the cloud using blockchain technology.
All the aforementioned systems are built around a storage system (e.g., CryptDB, Talos, and Pilatus are based on MySQL; Droplet stores state in an SQLite database; TimeCrypt uses a back-end based on Cassandra; and Crypsis, Cuttlefish, and Seabed are based on Spark batch processing). None of these systems considers streaming workloads as supported by C3PO and therefore do not address issues that arise in this setting, such as key management to limit the effect of key compromises or efficient cloud deployments for streaming applications, or computations that involve sliding windows with possible key changes in between.
10.3 Trusted Hardware-based Systems
Another way to enforce data confidentiality is through the use of specialized hardware that provides a trusted execution environment. An approach that is gaining popularity now is to use an Intel SGX7 enabled processor, which offers a trusted execution environment (so-called enclaves) in which data computations can be carried out confidentially. SGX offers hardware encrypted and integrity-protected physical memory, which allows data and code to reside in the untrusted cloud. SecureCloud [31] shows how SGX can be used to enable secure and private execution of big data applications in the cloud. Havet et al. [29] describe the design of SecureStreams, a streaming system that uses SGX to preserve confidentiality. SecureStreams use a Lua VM running inside SGX enclaves to capture worker and router components. Workers handle the application logic, and routers use a dispatching policy to handle message passing from one worker to another. Each component is then wrapped in Docker containers for isolation and ease of deployment. SecureStreams demonstrate that a streaming system using SGX is feasible. Instead, C3PO focuses on PHE and does not require specialized hardware. In addition, C3PO addresses challenges such as managing keys and allows for different deployments that improve performance. Trusted hardware approaches are orthogonal to the PHE-approach used by C3PO and could also be used to extend the design of C3PO by allowing secure computations to be performed on trusted hardware in an untrusted cloud. Cuttlefish shows the benefits of using SGX selectively when hitting the limits of PHE (for re-encryption).
11 CONCLUSIONS
We presented C3PO, a practical distributed system for evaluating continuous queries over encrypted data streams in public clouds. C3PO makes computations over encrypted data practical for stream processing by using a novel API, encryption inference, automatic re-encryption, and a set of other original optimizations. The C3PO API allows programmers to develop secure applications with little or no knowledge of the underlying cryptosystems. We evaluated our approach using standard benchmarks and applications, demonstrating its applicability and performance. Our evaluations show that we can meet latency requirements even with high volumes of encrypted traffic.
Footnotes
1 https://www.smartthings.com.
Footnote2 https://www.nest.com.
Footnote3 In the Star Wars™ saga, C-3PO is a risk-averse droid with a strong need for security and stability.
Footnote4 http://storm.apache.org.
Footnote5 https://github.com/ssavvides/homomorphic-c.
Footnote6 https://github.com/kokke/tiny-AES-c.
7 https://software.intel.com/en-us/sgx.
- [1] . 2004. Linear road: A stream data management benchmark. In International Conference on Very Large Data Bases (VLDB). 480–491.Google Scholar
- [2] . 2015. IoTAbench: An internet of things analytics benchmark. In ACM/SPEC International Conference on Performance Engineering.133–144.Google Scholar
Digital Library
- [3] 2020. Retrieved from https://developer.arm.com/ip-products/security-ip/trustzone.Google Scholar
- [4] . 2012. Smart*: An open data set and tools for enabling research in sustainable homes. Workshop on Data Mining Applications in Sustainability 111, 112 (2012), 108.Google Scholar
- [5] . 1994. Dense probabilistic encryption. In Workshop on Selected Areas of Cryptography. 120–128.Google Scholar
- [6] 2020. Retrieved from http://www.di-mgt.com.au/bigdigits.html.Google Scholar
- [7] . 2009. Order-preserving symmetric encryption. In International Conference on Theory and Applications of Cryptographic Techniques (EUROCRYPT), Vol. 5479. 224–241.Google Scholar
Cross Ref
- [8] . 2014. Terminology for Constrained-node Networks.
RFC 7228. RFC Editor. Retrieved from https://www.rfc-editor.org/rfc/rfc7228.txt.Google ScholarDigital Library
- [9] . 2020. TimeCrypt: Encrypted data stream processing at scale with cryptographic access control. In Networked System Design and Implementation. (NSDI). 835–850.Google Scholar
- [10] . 2003. Self-organized public-key management for mobile ad hoc networks. IEEE Trans. Mob. Comput. 2, 1 (2003), 52–64.Google Scholar
Digital Library
- [11] . 2016. Benchmarking streaming computation engines: Storm, Flink and Spark streaming. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1789–1792.Google Scholar
- [12] . 2020. Contact tracing mobile apps for COVID-19: Privacy considerations and related trade-offs. CoRR abs/2003.11511 (2020).Google Scholar
- [13] . 2002. The Design of Rijndael: —The Advanced Encryption Standard. Springer.Google Scholar
Cross Ref
- [14] . 2001. A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In International Workshop on Public Key Cryptography. 119–136.Google Scholar
Cross Ref
- [15] . 2014. FNR: Arbitrary length small domain block cipher proposal. In Conference on Security, Privacy, and Applied Cryptography Engineering (SPACE), Vol. 8804. 146–154.Google Scholar
Cross Ref
- [16] . 2008. The Transport Layer Security (TLS) Protocol.
RFC 5246. RFC Editor. Retrieved from http://www.rfc-editor.org/rfc/rfc5246.txt.Google Scholar - [17] . 2019. Ensuring confidentiality in the cloud of things. IEEE Pervas. Comput. 18, 1 (2019), 10–18.Google Scholar
Digital Library
- [18] . 1985. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31, 4 (1985), 469–472.Google Scholar
Digital Library
- [19] . 2007. Answering aggregation queries in a secure system model. In International Conference on Very Large Data Bases (VLDB). 519–530.Google Scholar
- [20] . 2009. Fully homomorphic encryption using ideal lattices. In Symposium on Theory of Computing (STOC). 169–178.Google Scholar
- [21] . 2012. Homomorphic evaluation of the AES circuit. In Annual International Cryptology Conference (CRYPTO), Vol. 7417. 850–867.Google Scholar
Digital Library
- [22] . 2017. Practical passive leakage-abuse attacks against symmetric searchable encryption. In International Joint Conference on e-Business and Telecommunications (ICETE). 200–211.Google Scholar
Cross Ref
- [23] . 1984. Probabilistic encryption. J. Comput. Syst. Sci. 28, 2 (1984), 270–299.Google Scholar
Cross Ref
- [24] . 2017. Leakage-abuse attacks against order-revealing encryption. In IEEE Symposium on Security and Privacy (SP). 655–672.Google Scholar
- [25] . 2014. Bolt: Data management for connected homes. In Conference on Networked Systems Design and Implementation (NSDI). 243–256.Google Scholar
- [26] . 2014. Searchable encryption with secure and efficient updates. In International Conference on Computer and Communications Security (CCS). 310–320.Google Scholar
Digital Library
- [27] . 2003. A tweakable enciphering mode. In Annual International Cryptology Conference (CRYPTO), Vol. 2729. 482–499.Google Scholar
Cross Ref
- [28] . 2016. SecureScala: Scala embedding of secure computations. In Symposium on Scala (SCALA). 75–84. Google Scholar
- [29] . 2017. SecureStreams: A reactive middleware framework for secure data stream processing. In International Conference on Distributed and Event-based Systems (DEBS). 124–133.Google Scholar
Digital Library
- [30] 2020. Retrieved from https://keybase.io/.Google Scholar
- [31] . 2017. SecureCloud: Secure big data processing in untrusted clouds. In Design, Automation & Test in Europe Conference & Exhibition (DATE). 282–285.Google Scholar
Cross Ref
- [32] . 2015. Frequency-hiding order-preserving encryption. In International Conference on Computer and Communications Security (CCS). 656–667.Google Scholar
Digital Library
- [33] . 2019. An efficiently searchable encrypted data structure for range queries. In European Symposium on Research in Computer Security. 344–364.Google Scholar
- [34] . 2015. Twitter heron: Stream processing at scale. In International Conference on the Management of Data (SIGMOD). 239–250.Google Scholar
Digital Library
- [35] . 2019. JEDI: Many-to-many end-to-end encryption and key delegation for IoT. In USENIX Security Conference. 1519–1536.Google Scholar
- [36] . 2013. Efficient and scalable IoT service delivery on cloud. In International Conference on Cloud Computing. IEEE Computer Society, 740–747.Google Scholar
- [37] . 2017. A survey on fully homomorphic encryption: An engineering perspective. ACM Comput. Surv. 50, 6 (
Dec. 2017). Google ScholarDigital Library
- [38] . 2004. The design and implementation of datagram TLS. In Networks and Distributed Systems Security Symposium (NDSS).Google Scholar
- [39] . 2015. Inference attacks on property-preserving encrypted databases. In International Conference on Computer and Communications Security (CCS). 644–655. Google Scholar
Digital Library
- [40] . 2017. Samza: Stateful scalable stream processing at LinkedIn. Proc. VLDB Endow. 10, 12 (
Aug. 2017), 1634–1645. Google ScholarDigital Library
- [41] . 2006. HealthGear: A real-time wearable system for monitoring and analyzing physiological signals. In International Workshop on Wearable and Implantable Body Sensor Networks (BSN). 61–64.Google Scholar
- [42] 2020. Retrieved from https://www.openssl.org.Google Scholar
- [43] . 1999. Public-key cryptosystems based on composite degree residuosity classes. In International Conference on Theory and Applications of Cryptographic Techniques (EUROCRYPT). 223–238.Google Scholar
Cross Ref
- [44] . 2012. Property preserving symmetric encryption. In International Conference on Theory and Applications of Cryptographic Techniques (EUROCRYPT), Vol. 7237. 375–391.Google Scholar
Digital Library
- [45] . 2016. Big data analytics over encrypted datasets with Seabed. In Symposium on Operating Systems Design and Implementation (OSDI). 587–602.Google Scholar
- [46] . 2019. Arx: An encrypted database using semantically secure encryption. Proc. VLDB Endow. 12, 11 (2019), 1664–1678.Google Scholar
Digital Library
- [47] . 2013. An ideal-security protocol for order-preserving encoding. In IEEE Symposium on Security and Privacy (SP). 463–477.Google Scholar
- [48] . 2011. CryptDB: Protecting confidentiality with encrypted query processing. In Symposium on Operating Systems Principles (SOSP). 85–100.Google Scholar
- [49] 2020. Retrieved from https://www.helpnetsecurity.com/2020/07/09/public-cloud-security-incident/.Google Scholar
- [50] . 1978. On data banks and privacy homomorphisms. Found. Sec. Comput. 4, 11 (1978), 169–180.Google Scholar
- [51] . 2020. Practical Confidentiality-Preserving Data Analytics in Untrusted Clouds.
DOI: https://doi.org/10.25394/PGS.12645440.v1Google Scholar - [52] . 2020. Efficient confidentiality-preserving data analytics over symmetrically encrypted datasets. Proc. VLDB Endow. 13, 8 (
Apr. 2020), 1290–1303. Google ScholarDigital Library
- [53] . 2017. Secure data types: A simple abstraction for confidentiality-preserving data analytics. In Symposium on Cloud Computing (SoCC). 479–492.Google Scholar
- [54] 2020. Retrieved from https://www.zerotier.com/.Google Scholar
- [55] . 2020. Droplet: Decentralized authorization and access control for encrypted data streams. In USENIX Security Conference. 2469–2486.Google Scholar
- [56] . 2017. Secure sharing of partially homomorphic encrypted IoT data. In Conference on Embedded Networked Sensor Systems (SenSys). 1–14.Google Scholar
Digital Library
- [57] . 2015. Talos: Encrypted query processing for the internet of things. In Conference on Embedded Networked Sensor Systems (SenSys). 197–210.Google Scholar
Digital Library
- [58] . 2000. Practical techniques for searches on encrypted data. In IEEE Symposium on Security and Privacy (SP). 44–55.Google Scholar
- [59] . 2014. Practical confidentiality preserving big data analysis. In Workshop on Hot Topics in Cloud Computing (HotCloud). USENIX Association.Google Scholar
- [60] . 2014. Program analysis for secure big data processing. In International Conference on Automated Software Engineering (ASE). ACM, 277–288.Google Scholar
- [61] . 2016. STYX: Stream processing with trustworthy cloud-based execution. In Symposium on Cloud Computing (SoCC). 348–360.Google Scholar
- [62] 2020. Retrieved from https://www.gmplib.org.Google Scholar
- [63] . 2014. FOILing NYC’s Taxi Trip Data. Retrieved from http://chriswhong.com/open-data/foil_nyc_taxi.Google Scholar
- [64] . 2016. An IoT-cloud based wearable ECG monitoring system for smart healthcare. J. Medical Syst. 40, 12 (2016), 286:1–286:11.Google Scholar
Digital Library
- [65] . 2008. A key management scheme using deployment knowledge for wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 19, 10 (2008), 1411–1425.Google Scholar
Digital Library
- [66] . 2012. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In Workshop on Hot Topics in Cloud Computing (HotCloud).Google Scholar
Index Terms
C3PO: Cloud-based Confidentiality-preserving Continuous Query Processing
Recommendations
Cloud Computing Security--Trends and Research Directions
SERVICES '11: Proceedings of the 2011 IEEE World Congress on ServicesCloud Computing is increasingly becoming popular as many enterprise applications and data are moving into cloud platforms. However, a major barrier for cloud adoption is real and perceived lack of security. In this paper, we take a holistic view of ...
Toward protecting control flow confidentiality in cloud-based computation
Cloud based computation services have grown in popularity in recent years. Cloud users can deploy an arbitrary computation cluster to public clouds and execute their programs on that remote cluster to reduce infrastructure investment and maintenance ...
Securely Adapt a Paillier Encryption Scheme to Protect the Data Confidentiality in the Cloud Environment
BDAW '16: Proceedings of the International Conference on Big Data and Advanced Wireless TechnologiesSince the emergence of the cloud paradigm, the fear of losing the privacy of sensitive data has become a main barrier to cloud services adoption. Indeed, researchers have stressed a useful technique that can address this barrier. This technique called "...





























Comments