skip to main content
research-article
Public Access

C3PO: Cloud-based Confidentiality-preserving Continuous Query Processing

Authors Info & Claims
Published:23 November 2021Publication History

Skip Abstract Section

Abstract

With the advent of the Internet of things (IoT), billions of devices are expected to continuously collect and process sensitive data (e.g., location, personal health factors). Due to the limited computational capacity available on IoT devices, the current de facto model for building IoT applications is to send the gathered data to the cloud for computation. While building private cloud infrastructures for handling large amounts of data streams can be expensive, using low-cost public (untrusted) cloud infrastructures for processing continuous queries including sensitive data leads to strong concerns over data confidentiality.

This article presents C3PO, a confidentiality-preserving, continuous query processing engine, that leverages the public cloud. The key idea is to intelligently utilize partially homomorphic and property-preserving encryption to perform as many computationally intensive operations as possible—without revealing plaintext—in the untrusted cloud. C3PO provides simple abstractions to the developer to hide the complexities of applying complex cryptographic primitives, reasoning about the performance of such primitives, deciding which computations can be executed in an untrusted tier, and optimizing cloud resource usage. An empirical evaluation with several benchmarks and case studies shows the feasibility of our approach. We consider different classes of IoT devices that differ in their computational and memory resources (from a Raspberry Pi 3 to a very small device with a Cortex-M3 microprocessor) and through the use of optimizations, we demonstrate the feasibility of using partially homomorphic and property-preserving encryption on IoT devices.

Skip 1INTRODUCTION Section

1 INTRODUCTION

The ubiquity of computing devices is driving a massive increase in the amount of data generated by humans and machines. With the advent of the IoT, many more billions of devices are expected to continuously collect sensitive data and compute on it, promising improvements in various sectors. For instance, improvements in sensors and increasingly practical wearable devices allow complex, automatic, and real-time health monitoring [41]. Such monitoring is beneficial by providing patients direct information on their current health status, facilitating diagnosis and treatment, and reducing costs of interventions and risks.

1.1 Cloud-backed IoT and Confidentiality

Due to limited storage and computation capacity available on IoT devices, the current de facto model for building IoT applications is to send the data gathered from physical devices to the cloud for both computation and storage (e.g., SmartThings,1 Nest2). Many IoT applications, therefore, leverage the cloud to compute on data streams from a large number of devices. For example, in smart health, healthcare providers can remotely monitor a larger number of patients, correlate data on a bigger scale, and detect abnormalities in health conditions at an early stage, which cannot be achieved with existing local infrastructure due to resource limitations. E.g., Yang et al. [64], propose a system that collects and displays real-time patient data and shows how using the cloud has alleviated issues of cross-platform deployment. Health service providers can also be disjoint from healthcare providers.

Due to the sheer amount of streaming data, building a private cloud infrastructure or expanding local infrastructure to support a large number of devices [36] is very expensive compared to using a low-cost public (untrusted) cloud infrastructure such as Amazon EC2 or Microsoft Azure. Therefore, public clouds are typically used for processing continuous queries including on sensitive data. Public clouds are also preferred because of the variety of software services they provide that make the development and deployment of corresponding applications very fast. However, this trend is fueling concerns over data confidentiality and is becoming one of the major factors preventing further widespread adoption of IoT solutions. In 2019, attacks leading to exposing user data or compromised accounts were experienced by 70% of organizations using public clouds [49]. These concerns represent a significant deterrent for industry domains like healthcare to adopt public clouds. If privacy concerns are addressed, then individuals may be more open to sharing their data, which is critical for contact tracing applications to help mitigate pandemics or epidemics [12].

One way to mitigate these concerns is to encrypt data at the source (i.e., IoT devices) and solely use cloud infrastructure for storage purposes (e.g., Bolt [25]). Thus, as long as encryption keys are maintained securely by consumers, the confidentiality of their data is enforced. While this approach addresses the above confidentiality concerns, all computations need to be performed in trusted environments, thus limiting the computational capabilities of public clouds for IoT solutions.

A promising approach to overcome this bottleneck is to use homomorphic encryption and execute all operations over encrypted data. However, fully homomorphic encryption (FHE) [20] causes significant slowdowns for complex computations despite continuous advances [21, 37].

An alternative, practical approach is to use less expensive partially homomorphic encryption (PHE) [50] in combination with property-preserving encryption (PPE) [44] to execute specific operations over encrypted data. Existing solutions based on PHE and PPE have mostly focused on database and batch processing systems. For instance, the seminal CryptDB [48] was implemented on top of the MySQL database, while Crypsis [59] was implemented in Apache Hadoop (Pig), and Cuttlefish [51, 53] and Symmetria [52] were implemented in Apache Spark. Such database-centric and batch processing solutions are not a good fit for many IoT applications that are implemented as continuous queries in a stream processing system. In addition, IoT devices can vary significantly in terms of their computing power and memory capacity. To enable devices that can be very resource-constrained (e.g., embedded devices using ARM Cortex-M3 can have as little as 72 MHz processing power and 64 KB memory) to encrypt sensitive data under various PHE and PPE schemes, these must be carefully implemented and optimized.

1.2 Challenges

A straightforward application of PHE and PPE to existing stream processing solutions to support computations over encrypted data is unlikely to be practical:

R1

Complexity of cryptosystems. Dozens of PHE and PPE schemes exist, varying by operations supported, efficiency, ciphertext size, and so on; IoT application developers do not necessarily possess sufficient knowledge of cryptosystems to judiciously select among these.

R2

Complexity of queries and data. Analytical continuous queries can become quite complex, leading to the intricate intertwining and combining of data items throughout lengthy sequences of processing stages. Tracking of data lineages becomes complex, yet it is necessary to determine which PHE or PPE schemes need to be applied to initial input data.

R3

Application variables & constants. Variable initialization and constants in queries must be carefully handled to preserve the confidentiality requirements.

R4

Inherent limitations. As hinted to by their names, PHE schemes do not support arbitrary operations. To overcome this limitation, the query processing can continue on the trusted client side after the intermediate results have been decrypted. Alternatively, the intermediate results can be re-encrypted to the schemes required by subsequent operations, after which computation can proceed in the cloud.

R5

Cryptosystems on IoT devices. IoT devices are oftentimes resource-constrained with limited amount of processing power and memory capacities. The ability to support PHE and PPE cryptosystems on IoT devices in an efficient manner is a major concern.

R6

Key compromises. With applications running continuously and potentially indefinitely, secret keys used for encryption need to be updated periodically or on-demand (e.g., when a device such as a health monitor is compromised). Such updates on IoT devices should be made transparently to the IoT application and should not cause disruptions to the execution of continuous queries or lead to missing results.

R7

Resource management. Finally, processing continuous queries typically involves a pipeline of computing tasks, each of which may have one or more instances running concurrently. The deployment profile that maps task instances to virtual machines (VMs) in the cloud should make balanced use of resources to avoid bottlenecks. While some mapping heuristics are known, they do not consider encryption, which shifts bottlenecks, e.g., by altering computation/communication overhead ratios.

Fig. 1.

Fig. 1. C3PO design overview.

1.3 C3PO Overview and Roadmap

This article presents C3PO (Cloud-based Confidentiality-preserving Continuous Query Processing),3 a novel managed runtime system that leverages PHE and PPE to provide confidentiality for IoT applications delegating online streaming jobs to the public cloud. C3PO operates on streaming data without revealing any plaintext information to the untrusted cloud. Figure 1 gives a high-level overview of the design of C3PO. A user designs, implements, and initiates the continuous query application that runs in the untrusted cloud, using the C3PO stream processing system. IoT devices automatically encrypt generated data before emitting them as part of streams for analysis. These devices can optionally be assigned to different groups for added security, as discussed in Section 6. Additional streams of encrypted private data required for analysis can be sent independently from a trusted tier maintained by the user.

To perform analytics in the untrusted cloud over encrypted data while at the same time addressing 1-7, C3PO provides several novel features. After presenting background information on PHE, PPE, and continuous queries (Section 2) and giving an overview of our solution (Section 3) including the assumed threat model and architecture of C3PO, this article makes the following contributions through C3PO and its features as outlined below:

  • Programming abstractions (Section 4): We propose an abstraction of secure streams, embodied in the C3PO API for typical plaintext streams, to enable programmers to conveniently express confidentiality-preserving continuous query programs. C3PO automatically transforms the program to work with encrypted streams, executing it efficiently in the public cloud. Developers thus focus on the application logic and not on the details of the underlying cryptosystems (1), nor which specific cryptosystem to use for which part of the application (2) or how to handle variable initialization and constant encryption (3). C3PO is capable of continuing computation in the trusted tier or re-encrypting (parts of) a data stream to enable further computation in the public cloud if a given sequence of computations cannot be performed due to PHE limitations (4).

  • Encryption optimization techniques (Section 5): We introduce new PHE and PPE optimization techniques (e.g., field masking and speculative encryption) and adapt existing optimizations to the setting of IoT (e.g., pre-computation, ciphertext packing, and caching) to reduce encryption time and ciphertext size overhead and provide efficient implementations of these techniques so they can run on resource-constrained IoT devices (5).

  • Key management schemes (Section 6): We introduce key management schemes to support transparent periodic and on-demand rotation of secret keys on IoT devices and enable partitioning of data spaces (IoT groups) to reduce key sharing (6). By partitioning data in an application-aware manner and by supporting on-the-fly key changes without disrupting the execution of continuous queries, our key management schemes limit breaches in cases of key compromises without hampering practicality.

  • Deployment optimization technique (Section 7): We propose a deployment heuristic that analyzes resource availability and requirements and generates a deployment profile that optimizes cloud resource usage (7). The heuristic maximizes the amount of computation performed in the cloud when splitting computation between the untrusted cloud and a small number of trusted nodes (trusted tier) used to overcome the inherent limitations of PHE. In the perspective of deployment, we also analyze the security of C3PO.

  • Prototype implementation (Section 8): We present the C3PO system that implements our API and other features outlined above, building on the well-known Apache Storm4 system. C3PO analyzes programs written using the C3PO API and applies the above-mentioned heuristic after identifying computations that can be executed purely on encrypted data and computations that, due to the limitations of PHE, cannot.

  • Performance evaluation (Section 9): We evaluate C3PO on multiple benchmarks and case studies. Our results indicate that C3PO can be used to express many real-world IoT applications while ensuring confidentiality transparently and keeping a low overhead.

We contrast C3PO with related work in Section 10 and conclude with final remarks in Section 11.

The system described in this article supersedes our STYX system [61]. STYX promoted similar programming abstractions as C3PO but did not allow for limited key sharing in time (key rotations) or space (multiple groups). This article also makes PHE and PPE schemes applicable to resource-constrained IoT devices through the use of novel optimizations and presents additional empirical evaluation results, in particular with respect to encryption (and associated optimizations) on IoT devices and key management. A high-level perspective of STYX is presented in Reference [17].

Skip 2BACKGROUND Section

2 BACKGROUND

In this section, we present background information on PHE, PPE, and the cryptosystems employed by C3PO. We then discuss relevant details of systems that support continuous queries.

2.1 Partially Homomorphic and Property-preserving Encryption

A cryptosystem is said to be homomorphic (with respect to certain operations) if it allows computations (consisting of such operations) on encrypted data. If and \(D(x)\) denote the encryption and decryption functions for input data \(x\), respectively (omitting keys for simplicity), then a cryptosystem is said to be homomorphic with respect to operation \(\phi\) if \(\exists\) operation \(\psi\) such that (1) \[\begin{equation} D(E(x_1) ~\psi ~ E(x_2)) = x_1 ~\phi ~ x_2 . \end{equation}\] For example, a cryptosystem is said to be an additive homomorphic encryption (AHE) scheme when \(\phi\) is addition “\(+\).” Similarly, a cryptosystem is said to be a multiplicative homomorphic encryption (MHE) scheme when \(\phi\) is multiplication “\(\times\).” Another category of cryptosystems that allows computations over encrypted data is property-preserving encryption (PPE). As the name suggests, PPE schemes preserve some property of the underlying plaintext that in turn can be used to perform operations over encrypted values. These operations include order comparisons “\(\lt\)” and “\(\gt\)” (order-preserving encryption (OPE)), equality comparison “==” (deterministic (DET) encryption) or text searches by applying “pattern matching” (searchable encryption (SRCH)).

Table 1 summarizes the cryptosystems used by C3PO and shows example operations for each of them. C3PO uses Paillier [43] as its AHE scheme to avoid high ciphertext expansion and avoid high decryption costs compared to, for example, the Goldwasser-Micali cryptosystem [23] that supports homomorphic addition on single bit inputs leading to higher ciphertext sizes and the Benaloh cryptosystem [5] that has a decryption time dependent on the security parameter, which makes decryption more expensive as that parameter increases. MHE schemes include the ElGamal [18] and unpadded RSA [50] cryptosystems. C3PO uses ElGamal for multiplications, as it is semantically secure, unlike unpadded RSA. Table 1 also shows that C3PO utilizes a set of PPE schemes that reveal uniqueness of encrypted values (DET) or ordering of encrypted values (OPE). In Section 3.1, we discuss the information leakage due to these encryption schemes and describe ways in which C3PO reduces this leakage.

Table 1.
CryptosystemPropertyTypeSecurityOperationsSecondary operations
AES (CBC mode)RNDProbabilistic
AES (CMC mode [27])DETPPEDeterministic\(==\)
FNR [15]DETPPEDeterministic\(==\)
Boldyreva et al. [7]OPEPPEDeterministic\(\lt\), \(\gt\)
Song et al. [58]SRCHPPEProbabilisticstring match
ElGamal [18]MHEPHEProbabilistic\(\times\), \(\div\)\(\times\), \(\wedge\)
Paillier [43]AHEPHEProbabilistic\(+\), \(-\)\(+\), \(\times\)
  • “Operations” column lists operations between two encrypted values, and “Secondary operations” column lists operations between an encrypted value and a plaintext value. “\(\div\)”’ denotes multiplicative inverse, and “\(\wedge\)” denotes exponentiation.

Table 1. C3PO Cryptosystems and Operations they Support over Encrypted Data

  • “Operations” column lists operations between two encrypted values, and “Secondary operations” column lists operations between an encrypted value and a plaintext value. “\(\div\)”’ denotes multiplicative inverse, and “\(\wedge\)” denotes exponentiation.

2.2 Secondary Homomorphic Operations

The operations supported by each cryptosystem as shown in Table 1 require that their operands are both encrypted under the same cryptosystem. In addition to these “primary” operations, some cryptosystems support “secondary” operations as long as one of the operands is in plaintext form (non-sensitive operand value). Consider, for example, the Paillier cryptosystem [43] with a public key \((g, N)\), where \(g\) is the generator and \(N\) is the modulus. Paillier primarily supports addition between two encrypted values, \(E(x_1)\) and \(E(x_2)\): (2) \[\begin{equation} D(E(x_1) \times E(x_2) \bmod N^2) = (x_1 + x_2) \bmod N \end{equation}\] Paillier also supports addition and multiplication between a ciphertext, \(E(x_1)\), and a (non-sensitive) plaintext, \(x_2\): (3) \[\begin{equation} D(E(x_1) \times g^{x_2} \bmod N^2) = (x_1 + x_2) \bmod N \end{equation}\] (4) \[\begin{equation} D(E(x_1)^{x_2} \bmod N^2) = (x_1 \times x_2) \bmod N \end{equation}\] Homomorphic subtraction can be achieved by performing an addition between the first operand and the additive inverse (by performing multiplication by \(-1\)) of the second operand: (5) \[\begin{equation} D(E(x_1) \times E(x_2)^{-1} \bmod N^2) = (x_1 - x_2) \bmod N \end{equation}\] Similarly, the ElGamal cryptosystem [18] supports multiplication and division (multiplication with the multiplicative inverse) between two encrypted values and multiplication/exponentiation between an encrypted and a plaintext value.

Paillier and ElGamal are defined in finite cyclic groups with a configurable plaintext space. Homomorphic operations are defined in these groups, and it is up to the application programmer to ensure that the associated plaintext spaces are large enough to accommodate the application needs and avoid overflows (due to either “primary” or “secondary” homomorphic operations). This is usually not a problem, since Paillier and ElGamal commonly use a plaintext space of up to 2,048 bits (or larger), which means they can encrypt plaintext values of up to 2,048 bits. This is because the security of these cryptosystems depends on problems that are hard to solve for large numbers, such as the decisional composite residuosity assumption for Paillier and the decisional Diffie-Hellman assumption for ElGamal. In contrast, continuous query applications usually work with integer values that are 32 or 64 bits. We note that, since homomorphic operations are defined in cyclic groups, homomorphic division works only when the dividend is divisible by the divisor. In cases where this is not the case, the division operation will succeed, but the result after decryption will be an incorrect very large (positive or negative) whole number. C3PO cannot identify what homomorphic division operations will fail a priori, but it can detect what division operations have failed after the results have been decrypted. Alternatively, C3PO can be configured to perform all divisions in the trusted tier that poses no restrictions on the division operands.

To make PHE and PPE schemes more suitable for IoT devices, we present extensions and optimizations applied to them in Section 5, where we also discuss how C3PO handles overflows while supporting negative numbers. We also give details about their implementation in Section 8.

2.3 Continuous Queries

The core abstractions offered by systems that support continuous queries are streams, tuples, and fields. Streams are unbounded sequences of tuples. Each tuple within a stream contains one or more fields with each field having an associated name. Values of each field can be accessed by dereferencing a tuple by the field name or the field index. Tuples in the same stream have the same set of fields (with distinct values). The tuples in the stream are processed in a distributed fashion. Application logic is arranged as a directed graph where vertices of the graph are computation components and edges are streams that represent the data flow between components. Application programmers write application logic for vertices of the graph. A subset of these vertices is also designated as source vertices. Source vertices act as entry points for data into the graph. Source vertices typically read data from a queue, log file, or external subscriptions. As data is generated in real-time and added to a queue, it is picked up by source vertices and forwarded down the graph for processing according to a grouping clause (described below).

Fig. 2.

Fig. 2. C3PO graph and tasks.

Figure 2 shows an example graph with four vertices with \(v_1\) designated as the source vertex. The figure also shows streams \(s_1-s_4\). Each vertex of the graph may have multiple runtime instantiations called tasks. Vertex \(v_1\) has one task running on public node \(n_1\) (a node represents a virtual or physical machine), \(v_2\) has two tasks running on public node \(n_1\), and \(v_3\) has three tasks running on public node \(n_2\). Finally, \(v_4\) has three tasks running on a trusted node \(n_3\). We refer to this assignment, i.e., the specific number of tasks per vertex, as the deployment profile of the graph.

The stream emitted by each vertex is declared explicitly in the vertex itself. Once all vertices of the graph are designated, the graph is assembled by defining the input stream of each vertex and specifying the grouping clause of each stream. A grouping clause defines how tuples in a stream are partitioned among the tasks of a vertex that receives the stream. This grouping clause is also part of the definition of the graph and is provided by the programmer. Common grouping clauses are:

(1)

shuffle grouping – tuples are distributed randomly across tasks in such a way that each task gets an equal number of tuples,

(2)

field grouping – tuples are partitioned according to a designated field and distributed among tasks, and

(3)

all grouping – the stream is replicated across all tasks.

Skip 3C3PO Overview Section

3 C3PO Overview

In this section, we first introduce C3PO’s threat model and discuss some assumptions about the IoT devices used in C3PO. We then present C3PO’s programming abstractions and runtime execution flow.

3.1 Threat Model

The goal of C3PO is to preserve data confidentiality in the presence of a semi-honest adversary. We assume the adversary has read-only access to the cloud nodes and can observe data residing in the nodes, execution of applications, and any generated intermediate results. We assume that the adversary cannot make changes in the queries, results, or data stored in the cloud and consider integrity and availability attacks to be out of scope for our system. We also consider IoT device compromises out of the scope of our article and focus on preserving the confidentiality of data in the untrusted cloud. We later relax this requirement and show how encryption keys of IoT devices can be updated in case of a compromise (Section 6). We assume that C3PO has access to a limited set of trusted resources outside the cloud (e.g., where the query results are used). As we will see shortly, this environment is leveraged to perform a few specific computations.

C3PO utilizes a set of encryption schemes such as DET and OPE schemes that are deterministic (see Table 1) and are known to provide lower security guarantees. DET schemes reveal the uniqueness of encrypted values, since the same plaintext is encrypted into the same ciphertext, unlike probabilistic schemes that randomize ciphertexts. OPE schemes reveal the ordering of encrypted values and, in some instances, they have been shown to leak partial plaintext [22, 24, 39]. To reduce the use of DET and OPE schemes, C3PO issues a warning to the programmer when the application requires to use DET or OPE, giving the option to the programmer to either deploy the parts of the query that would otherwise require DET or OPE operations on the trusted resources at the expense of performance, or deploy the application as is, an option that could be viable if the data requiring DET or OPE holds semi-sensitive or high entropy information such as timestamps. C3PO could also benefit from using database indices such as ArxRange and ArxEq introduced in recent work [46] to perform range and equality queries, respectively, in a manner that preserves semantic security, but we have not yet incorporated these primitives into the current implementation of C3PO.

3.2 IoT Device Assumptions

IoT device classes. We assume that IoT devices used in C3PO are of the C2 class or higher (see RFC7228 [8], “Classes of Constrained Devices”) with at least 50 KB of RAM and CPU operating at a frequency of at least a few 10 s of MHz. In addition, we assume that IoT devices are in the E9 class of energy limitation (RFC7228 [8], “Classes of Energy Limitation”) with no direct quantitative limitations to available energy. We plan to incorporate battery-powered IoT devices with limited energy capacity and examine the effect encryption has on battery life as part of our future work.

Key sharing. Managing encryption keys for IoT devices in a distributed setting is a challenging problem [10, 65]. To distribute and manage keys, we use a public key infrastructure (e.g., Keybase [30], ZeroTier [54]), a standard assumption in multi-party systems. However, secret keys on devices can be compromised by an attacker having access to the device. Trusted hardware solutions available for various classes of IoT devices can be exploited to securely store secret keys [3]. Key sharing and storage is an independent active research topic, beyond the scope of this article; the focus of C3PO is to ensure the confidentiality of the data in the cloud. In C3PO, we assume that all IoT devices are owned by a single party (e.g., a healthcare provider issuing health monitoring IoT devices). IoT devices are connected to the Internet and are capable of establishing a secure, authenticated channel to the device owner (key manager), allowing keys to be updated using standard protocols (e.g., TLS [16]) following prior work [57].

3.3 Programming Abstractions

One of the main challenges of computing over encrypted data is that the application developer needs to have a detailed understanding of each cryptosystem used to encrypt fields of a stream. Adoption of PHE and even FHE for generic application development will depend on the ease with which a programmer can incorporate the properties offered by the cryptosystem into their regular programming tasks. C3PO tackles this problem by offering simple programming abstractions to express and operate on encrypted data streams. Fields representing sensitive data are defined using the SecField abstraction irrespective of the operation that needs to be performed on them or the underlying cryptosystem used to represent the field, thereby relieving the programmer from the task of identifying the cryptosystem required for the operation at hand. A programmer simply annotates the stream with the desired operation and C3PO deduces the cryptosystem that needs to be used at the source IoT devices. Section 4 gives more details about the C3PO API.

3.4 Execution Flow

Fig. 3.

Fig. 3. C3PO execution flow.

Figure 3 outlines the steps followed by C3PO to set up and deploy an application securely in an untrusted cloud. Application programmers use the C3PO API and associated annotations to describe a graph that contains the application logic. C3PO then performs homomorphism analysis on the graph to generate an encryption strategy, containing the cryptosystems required to execute the graph in a confidential manner. The encryption strategy is then passed to the key manager that generates keys for each cryptosystem, as described in Section 6, and sends them to the IoT devices using a secure channel. Next, C3PO analytically identifies the number of tasks required for different vertices and schedules the graph for execution. C3PO leverages the idea that oftentimes users have some limited (but trusted) computing resources available. We refer to these resources as the trusted tier. The compute resources in the cloud, though potentially unlimited for practical purposes, are untrusted. C3PO utilizes the trusted tier for application development and compilation and uses the cloud for the deployment phase. In deployments that require resources from the trusted tier, C3PO tries to minimize their usage. The deployment steps are detailed in Section 7.

Skip 4C3PO Stream Processing Section

4 C3PO Stream Processing

In this section, we describe the programming abstractions used in C3PO and explain how these abstractions are used in addressing challenges 14 and leveraged for improving performance.

4.1 Programming Model

Application programmers use the abstractions provided by our C3PO Java API to specify the C3PO graph. Each vertex in the graph is designed as a separate class by extending the C3POVertex class. The stream emitted by each vertex is declared explicitly in the vertex itself. Once all vertices are designed, the graph is put together by defining the input stream of each vertex and specifying the grouping clauses.

Abstractions. Next, we explain the abstractions that are new to C3PO over typical stream processing systems such as Storm:

C3POVertex This base class is extended by programmers to express the computation in a vertex of the graph. The class provides the execute() method, which is invoked by C3PO when a tuple containing SecFields arrives at a vertex for processing.

SecField Programmers use this class to realize the abstraction of a secure field that refers to a confidential input field. Programmers can get a reference to a SecField value in a tuple using the SecField.getField() method or by reading an encrypted value directly from a stream.

SecOper The secure operator class provided by C3PO allows programmers to express standard operations such as add, subtract, multiply, divide, compare, equals, and match. Both primary and secondary operations listed in Table 1 are exposed through functions of the SecOper class. These functions take SecFields (or a SecField and an int value for secondary operations) as input and return a SecField for arithmetic operations, a Boolean value for equality comparisons, and string matching operations or an integer (\(-\)1, 0, or 1) for order comparisons.

@encOperations Programmers can also annotate each stream with the operations they want to perform on that stream through the @encOperations annotation. These annotations enable our compile-time graph analysis to identify the suitable cryptosystems for performing these operations and to apply additional performance improvement techniques, introduced shortly in Section 4.2, without requiring a modified compiler.

Example.

List. 1.

List. 1. C3PO code for finding the sum of each group in a sliding window.

List 1 shows a code snippet used in a C3PO vertex class extending the C3POVertex class (Line 3). The code is part of a graph that keeps track of the sum of values in different groups within a sliding window—last 60 seconds (numSlots = 60) in this example, as indicated by Line 4. The input tuple contains two fields: the group name and the value for that group. The code shown retrieves the group and value fields from the input tuple (Lines 9, 10) and updates the sum for that group’s current time slot (Line 11) with the value. Note that the annotation @encOperations(operations = {“{eq}“, {“{sum}”}) in Line 2 indicates to the compiler that the first field of the stream is used in equality comparisons and the second field in summing. Every time the vertex receives a timing tuple, signifying a minute has elapsed (code omitted), it emits the sum of all groups in the current sliding window (Line 7). The object maintaining the sliding window internally contains a map and updates the group’s sum every time the updateSum() method is called using C3PO’s SecOper.add() method (Line 24).

List. 2.

List. 2. Code for finding the sum of each group in a sliding window without C3PO abstractions.

List 2 shows just the function updateSum() from List 1 written without using C3PO abstractions. This example includes several implementation complexities and requires the programmer to

(1)

know that Paillier is the correct cryptosystem to use for performing additions;

(2)

explicitly read the Paillier public key (Line 3) that contains the generator, \(g\), and the modulus, \(N\); and

(3)

perform the exact computation \(\psi\) (see Equation (1)) for homomorphic addition with Paillier—multiplication modulo the square of the modulus \(N\) of the public key (Line 11)—including handling of null values (Line 10).

These implementation complexities are not specific to summation and the Paillier cryptosystem. For example, ciphertexts of the ElGamal cryptosystem contain two components and homomorphic multiplication of two ciphertexts is achieved by multiplying the two components of the ciphertexts, respectively, to generate the encrypted result. Similarly, equality comparisons, order comparisons, and search over encrypted data operations require non-trivial computations over the ciphertexts. The C3PO API hides all these implementation complexities from the application programmer.

4.2 Processing Secure Streams

We now give details on how C3PO tackles the challenges 14 introduced in Section 1.2 when processing continuous queries over encrypted streams.

Identifying encryption schemes (1, 2). This step identifies the cryptosystems that are required for the various fields based on the operations that the application wishes to perform on those fields. To apply these inferences, C3PO first has to identify different streams and their grouping clauses in the application logic. These can be derived from the graph declaration provided by the application programmer, as explained in Section 4.1. Second, C3PO derives the operations performed on each stream from program annotations (@encOperations) in each vertex class in the graph.

Once C3PO derives the distinct streams and operations to be performed on those streams, we can proceed similarly as in our prior work [60] to infer the cryptosystems required to execute the graph. In brief, we start by constructing an expression tree where fields in tuples form the leaf nodes. Operations performed on those fields form the non-leaf nodes. For C3PO graphs, we use field annotations (as specified in Section 4.1) to determine the non-leaf, operator nodes. For each operator, a lookup table identifies the cryptosystem of the operands and the result of the operator. Our goal now is to identify the cryptosystem in which all the leaf nodes (fields) should be encrypted. This can be done by identifying the parent operator node for each leaf node and using the lookup table to identify the type of cryptosystem required for operands for that operator node.

Handling public streams (2). An application might need to process plaintext data, such as publicly available stock quotes, from public streams as well as encrypted data. Public data can still be used in combination with private, encrypted streams to carry out useful computations. C3PO achieves this by allowing vertices to receive tuples containing both encrypted (SecField) and plaintext data. Through the use of the secondary homomorphic operations described in Section 2.1, SecFields and plaintext data can be combined without compromising data confidentiality.

Field masking (1). Computing on encrypted data introduces the additional challenge of dealing with operands with increased sizes. For instance, an addition of two 32-bit int operands in plaintext may transform into an operation over 4,096-bit encrypted operands in the encrypted data stream. This means a factor of 128 increase in the operand size. Typically, in stream processing systems the source vertex receives all the fields in the stream, irrespective of a field being used or not. For plaintext program graphs, this is usually not a substantial overhead, and the additional computation required for removing unused fields may not always offset the improvement that is observed. When the computation happens over encrypted data, filtering out fields has a much more significant impact because of the size of the fields. For example, consider a stream with two fields similar to the stream used for group by...sum in List 1. The first field is encrypted under a DET scheme and occupies 16 bytes, and the second field is encrypted under Paillier and occupies 512 bytes. If there is a continuous query that finds unique groups, then the second field will be unused. Simply removing the unused field reduces the size of a tuple from 528 bytes to 16 bytes. C3PO performs this unused field removal automatically using field masking. Since an unused field may be at any index within a tuple, if we simply drop the field, then program logic that accesses other fields using their indices may fail. To avoid this problem, during compile time C3PO keeps track of the indices of unused fields and appropriately adjusts all other indices that appear in program logic. To identify unused fields, C3PO relies on the stream annotations described in Section 4.1. For each vertex, we identify fields for which no operations are specified. Our masking process itself is very lightweight. Since we have information about fields to be masked at compile-time, we update the C3PO runtime with this information. C3PO then suppresses the emission of the masked fields.

Initialization and constants (3). Oftentimes, application logic requires variables to be initialized to a specific value, say, \(\alpha\). To preserve confidentiality, value \(\alpha\) cannot remain in plaintext and should instead be encrypted under the appropriate cryptosystem during program compilation. To identify the appropriate cryptosystem for encrypting \(\alpha\), C3PO first identifies the operation and the SecField that \(\alpha\) is used in. Then C3PO uses the @encOperations annotation to identify the cryptosystem used to encrypt the relevant SecField and uses the same cryptosystem to encrypt \(\alpha\). Similarly, C3PO encrypts any constants in the application.

Automatic re-encryption (4). Once the analyzer determines the cryptosystems required for each stream, it may detect situations where some operations cannot be performed over the available cryptosystems in the cloud. This can occur if there is a mismatch between parent and child operator nodes, because they express operations not supported by the same cryptosystem. Instead, C3PO can either perform those operations in the trusted tier or re-encrypt the stream in the trusted tier. For example, conditions that require more than one encryption scheme to be used on the same variable, e.g., \(x + y \gt \alpha\) or for conditions that include a public value such as \(secret\_value \gt public\_const\) where the public value cannot be encrypted as it is already public, C3PO will perform the entire control structure in the trusted tier. For re-encryption, C3PO inserts special re-encryption vertices into the graph and marks them so they get scheduled on the trusted tier only.

Skip 5PHE and PPE for IOT Devices Section

5 PHE and PPE for IOT Devices

To ensure confidentiality, sensitive information needs to be encrypted at the source (IoT devices), before it is sent to the public cloud for processing. Encryption of PHE and PPE schemes is commonly computationally expensive, and a straightforward use of these schemes on IoT devices with limited resources is unlikely to be practical. In this section, we introduce a set of optimizations as well as extensions to previously proposed optimizations to reduce time and space overheads associated with PHE and PPE encryption, making these schemes more practical for use in resource-constrained devices, thereby addressing challenge 5.

5.1 Pseudorandom Number Pre-computation

Encryption functions of probabilistic (PHE) schemes such as Paillier and ElGamal encrypt values by first generating a large pseudorandom number (PRN) and then carrying out computations involving the pseudorandom number (PRN) and the plaintext value. Generating this PRN is oftentimes the most expensive operation of the encryption function, but luckily, PRNs can be generated independently of encryption requests and stored for later use. C3PO leverages this fact to improve the performance of encryption. IoT devices in C3PO pre-compute and store a small number of PRNs during times the devices are otherwise idle. When a new value needs to be encrypted, the encryption function first checks whether a yet unused PRN exists and if so uses it to complete the encryption request. Otherwise, the encryption function generates a fresh PRN. In applications with IoT devices taking measurements sparsely that have plenty of idle time in between measurements to pre-compute PRNs, PRN pre-computation has a drastic improvement on encryption performance.

5.2 Support for Negative Numbers

C3PO uses Paillier and ElGamal to carry out arithmetic operations homomorphically over encrypted data. By default, these cryptosystems, as well as existing PHE-based systems that use these cryptosystems [48, 59, 61], do not support operations that involve negative numbers. To add support for negative numbers in C3PO, we introduce alternative implementations for the encryption and decryption functions for both Paillier and ElGamal.

To achieve this, we leverage the fact that most homomorphic encryption schemes, including Paillier and ElGamal, operate on large plaintext and ciphertext spaces, as discussed in Section 2.2. C3PO uses Paillier and ElGamal with a 2,048-bit plaintext space and a 4,096-bit ciphertext space. In comparison, the actual message space required by applications is much smaller, e.g., 32 bits for int values or 64 bits for long values. We, hence, define a configurable parameter \(\delta {} \ge 1\) that divides the plaintext space into two distinct parts. We treat values in the range \([0, \lfloor \frac{N}{\delta } \rfloor)\) as positive numbers and values in the range \([\lfloor \frac{N}{\delta } \rfloor , N)\) as negative numbers. When \(\delta = 1\) all values are treated as positive. By default, we set \(\delta = 2\), which divides the plaintext space into two roughly equal parts. For both Paillier and ElGamal, we define encryption that can handle negative numbers as: (6) \[\begin{equation} E^{\prime }(x) = {\left\lbrace \begin{array}{ll} E(x \bmod N) & \text{if}\ -\lceil N(1 - \frac{1}{\delta }) \rceil \le x \lt \lfloor \frac{N}{\delta } \rfloor \\ \varnothing & \text{otherwise,} \end{array}\right.} \end{equation}\] where \(E(x)\) denotes the original definition of encryption that does not support negative numbers with \(x \bmod N\) always returning a non-negative value even for \(x \lt 0\) and \(\varnothing\) indicates an invalid ciphertext was returned because input \(x\) lies outside the supported range. For both Paillier and ElGamal, we define decryption that can handle negative numbers as: (7) \[\begin{equation} D^{\prime }(x) = \left(\left(D(x) + {\left\lceil N\left(1 - \frac{1}{\delta }\right) \right\rceil }\right) \bmod N\right) - {\left\lceil N\left(1 - \frac{1}{\delta }\right) \right\rceil } \end{equation}\] or equivalently, and avoiding expensive modulo computations: (8) \[\begin{equation} D^{\prime }(x) = {\left\lbrace \begin{array}{ll} D(x) - N& \text{if}\ D(x) \ge \lfloor \frac{N}{\delta } \rfloor \\ D(x) & \text{otherwise,} \end{array}\right.} \end{equation}\] where \(D(x)\) denotes the original definition of decryption that does not support negative numbers. With the encryption and decryption functions described above, all homomorphic operations can remain unchanged.

5.3 Ciphertext Packing

As mentioned above, the plaintext space of Paillier and ElGamal is larger than the message space needed to represent numbers in applications. For example, encrypting a 32-bit integer value under Paillier or ElGamal will produce a 4,096-bit ciphertext that has a \(128\times\) ciphertext size overhead. To reduce ciphertext size overhead, C3PO adapts a technique introduced by Ge et al. [19] to pack multiple plaintext values into a single ciphertext. Ciphertext packing works by concatenating multiple messages into a single plaintext value before encrypting. For example, values \(a_1, a_2, \ldots , a_n\) can be concatenated into \(a_1 \circ a_2 \circ \cdots \circ a_n\) before being encrypted, where \(\circ\) indicates bit-string concatenation. As homomorphic operations are performed on ciphertexts, the operations are carried out on the underlying packed values separately.

A potential issue when carrying out operations over ciphertexts that pack multiple values are overflows, where the result of one set of packed values overflows into the preceding one, which would lead to incorrect results. Ge et al. demonstrate packing for AHE schemes and solve overflows by using multiple groups and keeping partial sums per group, careful to only pack values in a way that cannot overflow. This approach works well in a database setting assumed by Ge et al. but does not work in a continuous query setting, because values are generated in real-time and cannot be known beforehand. Instead, in this work, we introduce another approach where before each packed value we include a series \(P\) of 0 bits so in case of overflow, the preceding value will not be affected: \(P\circ a_1 \circ P\circ a_2 \circ \cdots \circ P\circ a_n\). Furthermore, in the next few paragraphs, we demonstrate a novel way of ciphertext packing for AHE as well as MHE schemes and a method of packing values after they have been encrypted, which we call post-encryption packing.

AHE packing. Homomorphic addition can be carried out when ciphertexts contain packed values, because arithmetically \((a_1 \circ \cdots \circ a_n) + (b_1 \circ \cdots \circ b_n) = (a_1 + b_1) \circ \cdots \circ (a_n+ b_n)\). To avoid overflows, we calculate the total number of bits, \(T\), that need to be allocated for each packed item and the number of items that can be packed in a single plaintext, \(I\), before encrypting as follows: (9) \[\begin{equation} T= P+ M=\lfloor log_2(R(2^{M}-1)) \rfloor + 1, \end{equation}\] (10) \[\begin{equation} I= \Bigl \lfloor \frac{K}{T} \Bigr \rfloor . \end{equation}\] In the above equations, \(P\) is the number of padding bits needed to capture overflows, \(M\) is the bit size of each message (e.g., \(M= 32\) for int values), \(R\) is the maximum number of tuples containing packed ciphertexts that can be aggregated before padding bits are exceeded, and \(K\) is the bit size of the plaintext space (e.g., \(K= 2,048\) when the modulus of the cryptosystem, \(N\), is 2,048 bits long). Here, \(2^{M}-1\) is the largest possible number that can be represented in \(M\) bits, \(R(2^{M}-1)\) is the largest possible number after \(R\) additions, and \(\lfloor log_2(x) \rfloor + 1\) is the number of bits needed to represent number \(x\). By default, we set \(R= 2^{30}\), which means that when packing 32-bit integers into a single 2,048-bit plaintext, we can fit 33 items before encrypting and can perform over 1 billion operations without exceeding the allotted padding bits. Secondary homomorphic operations are supported after packing but with some alterations. Addition between a packed ciphertext and a packed plaintext is seamlessly supported as per Equation (3). Multiplication between a packed ciphertext and a packed plaintext is not supported but multiplication between a packed ciphertext and a single (unpacked) plaintext value is supported using Equation (4), since mathematically \((a_1 \circ \cdots \circ a_n) \times b = (a_1 \times b) \circ \cdots \circ (a_n \times b)\).

MHE packing. C3PO also supports packing for MHE, but to a limited degree, because in multiplication each packed item of a ciphertext is multiplied with all packed items of the other ciphertext. We therefore fix the number of packed items to 2 and now, arithmetically, we have \((a_1 \circ a_2) \times (b_1 \circ b_2) = (a_1 \times b_1) \circ (a_1 \times b_2 + a2 \times b_1) \circ (a_2 \times b_2),\) which includes the intermediate term \((a_1\,\times \,b_2 + a2 \,\times \, b_1)\). By ignoring the intermediate term, we get the required \((a_1 \times b_1) \circ (a_2 \times b_2)\). Note that every multiplication generates an additional term in the ciphertext. To ignore these terms after decrypting, we extend the ciphertext of our MHE scheme to include a counter indicating how many multiplications have been performed to generate that ciphertext. When decrypting, this counter is used to identify how many intermediate terms need to be ignored to get the correct result. We compute the total number of bits, \(T\), allocated for each packed item as (11) \[\begin{equation} T= P+ M= \lfloor log_2(2^{M} - 1)^{R} \rfloor + 1 \approx MR \end{equation}\] where \((2^{M} - 1)^{R}\) is the maximum possible number after \(R\) items of \(M\) bits each are multiplied together. We calculate the number of tuples, \(R\), that can be aggregated as follows: (12) \[\begin{equation} R= \Bigl \lfloor \frac{K}{T} \Bigr \rfloor - 1 \Rightarrow R\approx \Bigl \lfloor \frac{\sqrt {M^2+4MK}-M}{2M} \Bigr \rfloor . \end{equation}\] In the above equation, \(-1\) accounts for the intermediate terms discussed above. By replacing \(T\) with the approximation \(MR\) and solving for the positive root of the quadratic equation, we get the final term. The above equations show that when packing 32-bit integers into a plaintext space of 2,048 bits a total of \(R= 7\) packed ciphertexts can be multiplied together before an overflow exceeds the padding bits. To continue performing multiplications after this, the packed ciphertext needs to be refreshed by being re-encrypted using a trusted node. This suggests that MHE packing is mostly useful in applications that need to perform multiplications infrequently or when there are frequent key changes (discussed in Section 6) as part of which the packed ciphertext can be refreshed. MHE packing does not support secondary operations.

Post-encryption packing. The packing techniques described above require multiple messages to be packed into a single plaintext before encrypted. Often it is not possible to pack messages

before they are encrypted, e.g., when the messages are generated over time and not available at the moment of encryption. It is still beneficial to pack values after they have been encrypted through post-encryption packing to reduce ciphertext size and decryption times. Post-encryption packing is particularly useful for continuous query applications that retain aggregated values over long periods of time, e.g., to keep track of daily, weekly, monthly, or yearly statistics. These aggregated values can be packed together as a single ciphertext value. We support post-encryption packing for AHE schemes. To pack multiple ciphertexts into a single one, we first (homomorphically) shift the ciphertexts appropriately and then add them up: (13) \[\begin{equation} c_p = \sum \limits _{i=0}^{n-1} c_i \otimes 2^{iT} . \end{equation}\]

In the above equation, \(\otimes\) denotes homomorphic multiplication between a ciphertext and a plaintext value (see Section 2.2), and \(T\) indicates the total number of bits required per packed item including padding bits, calculated as per Equation (9). \(n \le I\) is the number of ciphertexts to pack together where \(I\) is calculated using Equation (10), \(\sum\) denotes homomorphic summation, and \(c_p\) is the resulting packed ciphertext.

5.4 Caching and Speculative Encryption

Deterministic schemes used in C3PO such as DET and OPE produce the same ciphertext for a fixed plaintext value. C3PO leverages this fact by having IoT devices store a small number of encrypted values that are likely to be re-used. Specifically, C3PO uses a map that can hold a fixed number of plaintext to ciphertext key-value pairs and imposes an LRU policy to remove pairs once the map is full. The number of items that the map can hold is configurable and adjusted depending on the memory capacity of each IoT device. We extend the definitions of the encryption functions for DET and OPE to first search this map to see if the plaintext-to-ciphertext key-value pair exists, and if so return the corresponding ciphertext, otherwise encrypt the given plaintext and add it to the map.

To further reduce the encryption time overhead, C3PO uses speculative encryption by predicting what values will need to be encrypted next. C3PO encrypts and stores a small number of plaintext-ciphertext pairs proactively during times that IoT devices are idle. Speculative encryption is mostly useful in scenarios where the range of possible values is small, or for low entropy values, such as when measuring temperature in a closed environment.

5.5 Format-preserving Encryption

Oftentimes, continuous queries include equality comparisons involving values of a fixed format such as dates, timestamps, or phone numbers. To perform equality comparisons, these values need to be encrypted under a deterministic scheme. Naïvely encrypting fixed-format values under a deterministic scheme such as AES will result in a ciphertext that is at least the size of the block of the cryptosystem used. Instead, C3PO employs existing format-preserving encryption techniques via the use of the FNR [15] cryptosystem that generates \(n\)-bits of ciphertext for \(n\)-bits of input plaintext as long as the plaintext is smaller than 128 bits. For plaintext values larger than 128 bits, C3PO uses AES, as shown in Table 1. This optimization allows C3PO to reduce the ciphertext size overhead for values that need to be encrypted under a deterministic scheme. Keeping the ciphertext size overhead small leads to smaller end-to-end latency, because the data that needs to be transmitted from the IoT devices to the cloud nodes for processing is smaller. Furthermore, using FNR to keep ciphertext size smaller is particularly useful when also employing the caching and speculative encryption optimization described above. Since C3PO supports devices with as little as 64 KB of memory, having ciphertexts of smaller size allows IoT devices to retain a larger number of cached ciphertexts in memory.

Skip 6IOT KEY MANAGEMENT Section

6 IOT KEY MANAGEMENT

To reduce the risk of secret keys being compromised in continuous query applications, C3PO rotates keys periodically or on-demand without causing disruptions to query executions. In this section, we thereby address challenge 6 and describe key management in C3PO, with a particular focus on the replacement and sharing of keys.

6.1 Key Sharing

As mentioned in Section 3.2, we consider a setting where all IoT devices are owned by a single party and managed by a trusted key manager that distributes keys to IoT devices. To limit the sharing of keys across devices and over time, we employ the following techniques in C3PO:

  • Key rotation: update keys periodically (or in the event of a key compromise) without service disruption.

  • Multi-group mode: limit the number of devices that share an encryption key by splitting devices into multiple groups, each with a different key.

  • Field-level key identification: ensure fields that are not part of a common operation do not share encryption keys.

6.2 Key Rotation

A key challenge of using homomorphic encryption for IoT-based streams is that applications are often long-lived, which increases the chance of key compromises. To mitigate this, C3PO allows encryption keys to be rotated (updated) periodically or on-demand. Periodic key rotations could be part of a security policy. Further, if a key has been compromised, then an on-demand key rotation is initiated. Due to the nature of continuous queries and because homomorphic operations can only be carried out on operands encrypted with the same key, key rotations are not straightforward. Thus, we first start with an explanation of how keys are rotated in the general case and then present how to handle queries that involve aggregated values or sliding windows. Then, we present a method that helps reduce the number of keys that need to be rotated in case of a compromise.

General case. C3PO supports key rotations without disrupting the output. Usually, all IoT devices are considered to form a single logical group and share the same keys. A direct consequence of this is that every key leak will result in all devices needing to update their encryption keys, a problem that we address shortly with our multi-group mode. When a key change is initiated by the key manager, each IoT device first emits a key change marker that includes the new key identifier. When the C3POVertex base class detects the key change marker in the stream, it creates a new instance of the application vertex class to process the stream. Any encrypted constants or literals involved in the computations are re-encrypted under the new key by invoking the trusted tier, at which point computation is moved to the new instance and the old instance is abandoned.

Fig. 4.

Fig. 4. Key change in continuous queries. Streams flow from left to right (rightmost element, \( x_1 \), is oldest). After \( x_2 \) is emitted a key change is initiated.

Sliding window. The general case described above does not consider queries that contain computations involving older values received via the stream such as computations involving a sliding window. An example of such computation would be a query that computes the sum of the last few received items (or similarly, the sum of a certain time interval based on timestamps). In this case, results of computations that occurred under the previous key must be included in subsequent computations. C3PO follows the same process as before, and the stream encrypted with the new key is channeled into the new instance of the vertex class, but this time the stream encrypted with the old key continues processing uninterrupted. Instead, C3PO suppresses emissions from the new vertex instance until the new instance contains tuples spanning the full length of the sliding window. At this point, the old instance of the application vertex class is discarded and the stream from the new instance is emitted. Figure 4 illustrates this. Figure 4(a) shows the effect of a naïve key change in an encrypted data stream. Values are first encrypted with key \(k_1\) and then \(k_2\). Aggregations with sliding windows that span values encrypted with both keys will fail. In Figure 4(b), when a key change is initiated, a specific number of values (equal to the size of the aggregation window) are encrypted with both keys \(k_1\) and \(k_2\). This allows aggregations to preserve semantics, i.e., avoid any disruption in output.

This solution works well as long as the sliding window is small. For large sliding windows, or for queries with aggregation functions that span the entire duration of the query, the above solution becomes inefficient, since for every key change, and as long as the sliding window does not end, another stream is added. Instead, C3PO handles this case by using the trusted tier to re-encrypt the current aggregated result under the new key. After the re-encryption step, the query can correctly handle tuples encrypted with new keys.

6.3 Multi-group Mode

To reduce the surface of affected devices in the event of a key compromise, C3PO introduces the multi-group mode. In this mode, IoT devices are grouped into logical subsets and a different key is assigned to each set (where otherwise the key would be the same). IoT devices can be grouped together based on any user-defined criteria. Devices that are behind the same gateway usually make a good grouping. This allows us to rotate the encryption keys of devices behind a specific gateway independent of devices outside the gateway.

Operations across groups. At the processing end, multiple groups lead to additional tasks for C3POVertex. This is because when there are multiple groups using different keys, C3PO cannot combine results from all groups entirely in the cloud. To complete such queries, C3PO keeps track of each device group and saves query results of each group separately. The results from each individual group are then combined together to get the full result on the client side. Note that C3PO internally does not truly distinguish between single-group and multi-group modes; the former is simply a special case of the latter with 1 group.

List. 3.

List. 3. C3PO combiner implementation for aggregation in multi-group mode.

Key rotation. So far the discussion about key rotations assumed a single logical group containing all IoT devices. In this setting, in case of a key compromise, all devices need to replace the compromised key with a new one. The multi-group mode allows us to initiate key rotations for any individual group. List 3 shows a C3POCombiner implementation for the group by... sum example shown in List 1. In multi-group mode, each stream with multiple groups is associated with a combiner capable of combining the results of all key groups. In this example, the combiner class overrides the combine() and emit() functions in the C3POCombiner base class (Lines 4 and 8). The combine() function sums values corresponding to the same key group, generates intermediate sums per group, and stores them in a map with the group as the key and the sum as the value of the map. Once all values per key are combined the emit() function emits each key-value pair as a separate tuple. The receiver of these tuples (1 tuple per group) can finally compute the total sum by adding the intermediate sums together after decrypting them.

6.4 Field-level Key Identification

For operations to be performed over encrypted data, fields involved in the same operation must be encrypted using the same key. Inversely, fields not involved in the same operation should use a different key to prevent leaking relations between fields unnecessarily and to minimize the impact of compromised keys. E.g., to perform the operation \(x_1+x_2\) both \(x_1\) and \(x_2\) must be encrypted with the same AHE scheme and using the same key, or the operation will generate a wrong result. Separately, if we also need to perform the operation \(x_3+x_4\), then \(x_1\) and \(x_2\) need to be encrypted under the same key, but \(x_3\) and \(x_4\) can be encrypted under a different key even though all four fields need to be encrypted under an AHE scheme to carry out the addition. We capture these field groupings by assigning fields into “field families” that indicate which fields are involved in the same operations, directly or indirectly. Following this intuition, we derive two invariants that need to hold for all keys to minimize data leaks while preserving program correctness.

I1. Correctness

Fields involved in the same kind of operations and belonging to the same field family need to be encrypted with the same key.

I2. Security

Fields involved in different kind of operations or (either or) belong to a different field family need to be encrypted with different keys.

Fig. 5.

Fig. 5. C3PO key management.

6.5 Key Generation

The key manager uses key group information and the encryption strategy generated during the homomorphism analysis step of the compilation to decide how to generate keys, as shown in Figure 5. It then associates a key to each field in a manner that satisfies invariants I1 and I2 introduced above. More specifically, keys are generated based on the equation given below: (14) \[\begin{equation} K_{c,f,g} = \textrm {PRP}_{\textrm {MK}}(c, f, g), \end{equation}\] where PRP is a pseudo-random permutation (e.g., AES block cipher) and MK is the master key, known only to the key manager, from which all other keys are derived. Furthermore:

  • Cryptosystem c: c indicates the cryptosystem the key is used for. Since the cryptosystem used indicates the operation that can be performed over encrypted data, by invariant I2, different cryptosystems should lead to different keys.

  • Field family f: f indicates the field family of fields. By invariant I1 fields of the same field family need to be encrypted under the same key. For different field families, C3PO uses a different f, which will generate a different key, even if the cryptosystem is the same.

  • Group g: g captures the group identifier to support multi-group mode. In the general case (single-group mode), there is simply a single all-encompassing group.

Once the keys are generated, the key manager opens a secure channel with each IoT device and sends only the keys each device requires, according to what fields each device generates and what group it belongs to. To keep track of what keys are sent to each device and to be able to identify which devices need to be sent new keys during key rotation, the key manager keeps a map of key IDs per device (key metadata).

Skip 7C3PO Deployment and Security Analysis Section

7 C3PO Deployment and Security Analysis

Next, we describe how graphs are deployed in the C3PO runtime (challenge 7) and discuss C3PO’s security properties.

7.1 Deployment Profile Generation

As defined in Section 2, the number of runtime tasks assigned for each vertex in the graph is called the deployment profile of the graph. A good deployment profile is required to avoid bottlenecks and ensure good resource utilization.

Utilization. To reason about the effectiveness of deployment profiles, we define utilization of a vertex for a time interval as the amount of time the vertex spends processing during that time interval. For instance, if a vertex spends 5 minutes in a 10 minute interval processing tuples and the rest of the time waiting for tuples to arrive, then it has a utilization of 0.5. As the utilization of a vertex approaches 1, we can assume it is starting to become a bottleneck. Good resource utilization is usually achieved by the programmer explicitly specifying the number of tasks for each computation vertex. Programmers are perfectly suited to do this, as they understand, via application logic, which vertices handle more data or computation and can correspondingly allocate more tasks for those vertices. In C3PO, when the computation graph is transformed and operations are converted to their cryptographic equivalents, the utilization of a vertex changes substantially. This means that programmers need to thoroughly understand the overheads of each cryptosystem, which goes against C3PO’s design goals.

Fig. 6.

Fig. 6. C3PO deployment heuristic.

Heuristic. We propose a linear programming-based heuristic that automatically converts the deployment profile for a plaintext graph into an optimized deployment profile for the corresponding C3PO graph. Figure 6 shows the formal representation of the heuristic that we use. S represents the slots available for instances to use and V represents the vertices in the graph that need to be allocated. A slot is typically a Java virtual machine (JVM) or an executor thread within a JVM. We assume all slots have the same processing capacity. Matrix A represents how much each vertex amplifies its input. A is derived by executing the plaintext graph on sample data. To compute the amount of data arriving at a vertex, we consider all paths of varying lengths that end up at that vertex from the source. Matrix A gives the amount of data at vertices that are one edge away from the source (for unit input). To find data arriving at the vertex i (represented by \(d_i\)) through paths of length 2 and higher, we compute the power matrix of A represented in Figure 6 as \(A^2\), \(A^3\), and so on. Vector C represents the load on each vertex relative to one another. C is derived by inverting the number of instances for each vertex (from the deployment profile) in the plaintext version of the graph and then scaling it with respect to the crypto operations performed by the vertex.

For example, assume that a programmer specifies the number of instances for each vertex as \(v_1:1, v_2:3, v_3:2, v_4:6\) for the plaintext graph presented in Figure 2. We create a vector of these values and invert them to get the vector \([1, 1/3,1/2,1/6]\), normalized to \([6, 2, 3, 1]\). The intuition here is that for vertices that come under heavy load, the programmer will allocate a higher number of instances in the deployment profile to accommodate the load. At the next step, we scale down the value of each element in the above vector based on a reduction factor. This reduction factor is derived empirically based on our observations. We use a reduction factor of 6 for re-encryption nodes, 3 for AHE and MHE schemes, and 2 for all other crypto operations. Consequently, and based on Figure 2, if \(v_3\) receives a stream (\(s_2\)) with a field encrypted under AHE, then we scale down the value corresponding to \(v_3\) to 1, after which, we get \(C=[6, 2, 1, 1]\), and repeat for other vertices.

In Figure 6, T represents the deployment profile, and \(t_i\) represents the number of slots allocated to execute vertex \(v_i\). Our target now is to derive each \(t_i\). We define two sets of constraints. The first set of constraints ensures each vertex is allocated to at least one slot. The second set of constraints ensures that for all vertices, the load, c, is less than the capacity of the nodes to process it. Under these constraints, we maximize the amount of data that can be consumed at the source vertex.

7.2 C3PO Scheduler

The primary responsibility of the C3PO scheduler is to decide on which host machine(s) each vertex of the graph will be executed. The C3PO scheduler is provided with two lists of hostnames, one that lists hosts in the untrusted cloud, and another that lists hosts in the trusted tier. The scheduler reads the graph annotation to identify where each vertex must be executed.

For components that need to be executed in the trusted tier, the scheduler sends the appropriate class files to workers running in the trusted tier. Trusted tier workers have access to the secret keys required for encryption/decryption. The workers in the untrusted cloud can only access encrypted data and only have access to the public keys required to perform the homomorphic operations.

We note that the scheduler service can be deployed in the untrusted cloud. An attacker can try to manipulate the scheduler in the following ways:

(a)

attempt to execute trusted vertices in the untrusted cloud, and

(b)

attempt to execute untrusted code in the trusted tier.

Attack (a) does not compromise confidentiality, since the untrusted cloud does not possess the secret keys required to reveal the plaintext data. However, attack (b) can compromise confidentiality if the attacker is successful in executing malicious code that retrieves secret keys or reads data when they are in plaintext while being re-encrypted. To avoid this, a hash of the vertices to be executed in the trusted tier is generated before deployment. When tasks are delivered to the trusted tier for execution, the trusted tier first computes a hash of the task class and compares it with the hash generated before deployment. Execution proceeds only if the hash is verified.

7.3 Security Analysis

In this section, we analyze threats across various system components of C3PO such as IoT devices, cloud nodes, trusted tier, and the network and describe how C3PO addresses these threats.

Threat 1: Cloud compromises. The main security objective of C3PO is to preserve the confidentiality of data at rest and data in use in the presence of a semi-honest adversary. The adversary is expected to have read-only access to the data in persistent storage and the main memory of the cloud nodes. C3PO defends against this attack by never revealing secret keys or plaintext values of sensitive data to the cloud. Through the use of PHE and PPE, sensitive data remains encrypted both when stored and when being used. Furthermore, through the use of secondary homomorphic operations and associated optimizations (Section 5), C3PO allows computations between sensitive and non-sensitive data without revealing information about the sensitive input values or the output values. As discussed in Section 3.1, C3PO can optionally use encryption schemes that reveal relationships among data items. The use of such encryption schemes can be reduced by making use of the trusted tier and through the use of alternative encryption schemes discussed in Section 10.1. Even though C3PO is primarily concerned with a passive attacker, it also prevents active attackers from executing malicious code in the trusted tier by verifying the hash of code deployed in the trusted tier (Section 7.2).

Threat 2: Data in transit attacks. A passive attacker can attempt to issue a network snooping attack to extract keys and sensitive data while they are being transmitted over the network. To ensure the secure delivery of keys, C3PO first establishes a secure TLS-based connection between the key manager and IoT devices and transmits keys over TLS. By default, C3PO sends PHE- and PPE- encrypted data with TLS disabled, since the data is already encrypted. To improve the confidentiality guarantees for data encrypted under PPE schemes, or to prevent integrity attacks in case of an active network attacker, C3PO can be configured to send data through a TLS channel. This means that the data is first encrypted under an appropriate PHE or PPE scheme, and in addition encrypted using a probabilistic authenticated encryption scheme such as AES-GCM, ensuring the confidentiality and integrity of transmitted data. Once the data is received by the cloud nodes, the outer layer of encryption is removed, leaving the data encrypted under a PHE or PPE scheme so the data can be homomorphically analyzed.

Threat 3: IoT device compromises. IoT devices are vulnerable to physical attacks that can compromise secret keys (for asymmetric encryption schemes only the public key is made available to the IoT devices). A subset of these keys is shared between multiple IoT devices, since this is a requirement for homomorphic operation correctness. C3PO does not prevent IoT key compromises altogether but reduces the effect of such key compromises. In particular, C3PO introduces two key invariants and field-level key identification (Section 6.4), which limit the number of keys that need to be shared across IoT devices. Field-level key identification also limits the number of fields encrypted under the same key while maximizing the range of homomorphic operations that can be performed across fields of data. To further reduce the effect of key compromises, C3PO introduces a multi-group mode (Section 6.3) that limits the number of devices that share common keys even further and allows frequent key rotations while minimizing service disruptions.

Skip 8IMPLEMENTATION Section

8 IMPLEMENTATION

In this section, we lay out some of the implementation details of C3PO.

8.1 Storm Integration

C3PO’s processes in the cloud are implemented by modifying Apache Storm. Storm is an online, distributed computation system. Application logic in Storm is packaged into directed graphs called topologies. Vertices of the topologies are computation components and edges represent data flow between components. There are two types of components in Storm:

(1)

spouts, which act as event generators, and

(2)

bolts, which capture the program logic.

In other words, spouts produce the data streams upon which the bolts operate. Modifications to Storm are limited to implementing a new scheduler by overriding the IScheduler interface and to the way a Storm topology is submitted (StormSubmitter and related classes). These changes add an additional 1,031 lines of Java code to Storm. The C3PO programming interface and cryptographic classes that allow computations over encrypted data (but do not include encryption/decryption functions) are packaged as a separate jar library, implemented in 3,633 lines of Java code. The key manager is implemented in 900 lines of Java code and uses DTLS [38] to establish an end-to-end secure channel with IoT devices. Key metadata are stored in XML files and the key manager includes an XML parser to retrieve this data.

8.2 Cryptosystems

Our cryptosystems including the extensions and optimizations described in Section 5 are implemented in C5 and accessed where necessary through the Java native interface (JNI). Randomized encryption (RND) is implemented using the AES [13] cryptosystem in CBC mode with a random initialization vector. Deterministic encryption (DET) is implemented using an AES pseudo-random permutation block cipher with a variant of CMC mode [27] with a zero initialization vector. FNR [15] is used as an alternative DET cryptosystem to preserve the format of small values. The Boldyreva et al. [7] cryptosystem is used as our OPE scheme implementation and Song et al. [58]’s cryptosystem as our SRCH scheme.

We implemented the Paillier [43] cryptosystem as our AHE scheme and followed the approach of Damgård and Jurik [14] to set the generator \(g = N + 1\) for a more optimized implementation of encryption. We also used the Chinese Remainder Theorem to optimize the decryption function of Paillier. Finally, we implemented ElGamal [18] as the MHE scheme. Paillier and ElGamal require arbitrary precision arithmetic computations as part of their encryption, decryption, and homomorphic operations. We implemented three different versions of Paillier and ElGamal, each using a different arbitrary precision arithmetic library, since not all these libraries are supported on all IoT devices. We use the GMP library [62] (version 6.1.2) and its mpz arithmetic primitive when available. Alternatively, we use the OpenSSL library [42] (version 1.1.1) and its BIGNUM arithmetic primitive. For highly resource-constrained devices that do not support GMP or OpenSSL, we use the BigDigits library [6] (version 2.6) and its BIGD arithmetic primitive, which is a very small but less optimized library.

Skip 9EVALUATION Section

9 EVALUATION

We evaluated C3PO using standard benchmarks and use cases. Our evaluation shows that C3PO can preserve confidentiality by executing on encrypted data with 20%–30% higher latency and around \(23\%\) reduction in throughput. We use several scenarios for evaluation as follows:

  • Encryption latency: We assess the feasibility of using C3PO with resource-constrained devices by analyzing the encryption latency of various cryptosystems and associated optimizations used by C3PO on IoT devices of the aforementioned C2 or higher classes (see Section 3.2).

  • Smart meter analytics: We use the Smart* dataset [4] as our input together with queries adapted from IoTBench [2] to compare the throughput of C3PO to vanilla Storm. In this scenario, the volume of processed data is of primary concern. This includes an assessment of field masking.

  • Heartbeat analysis: We use a heartbeat analysis application that computes individual and group statistics. We use this application to evaluate the latency of C3PO; query response times are critical in such healthcare applications for triggering emergency responses in a timely manner. This includes an assessment of PRN pre-computation and post-encryption packing.

  • Yahoo streaming: We also use the more generic Yahoo Streaming Benchmark (YSB) [11] for further evaluating the latency of C3PO. Latency is critical in this benchmark, as the goal is to react quickly to advertisements.

  • Linear road: We use the Linear Road Benchmark (LRB) [1] stream processing benchmark, which requires re-encryption to assess the effectiveness of our deployment heuristic, analyzing throughput, latency, as well as resource utilization, which is key to efficient deployment.

  • Multiple groups: We use a microbenchmark to analyze the effects of C3PO’s multi-group feature.

  • New York taxi statistics: Finally, we evaluate the costs of re-keying by computing statistics over a large number of nodes (devices) based on a publicly available data-set [63] from New York taxis released under FOIL (Freedom of Information Law).

9.1 Encryption Latency

To evaluate the feasibility of our approach from the point of view of the end devices, we consider the encryption latency of various cryptosystems on different IoT devices. We use five devices with different computational/memory capacities:

2xl:

Amazon AWS m5.2xlarge instance with 3.1 GHz CPU and 16 GB RAM.

Pi3:

Raspberry Pi 3 Model B with Quad Core 1.2 GHz Broadcom BCM2837 CPU and 1 GB RAM.

Pi0:

Raspberry Pi Zero W with a 1 GHz 32 bit single-core CPU and 512 MB RAM.

A8:

ARM Cortex-A8 with 600 MHz 32-bit microprocessor and 256 MB RAM.

M3:

ARM Cortex-M3 with a 72 MHz 32-bit microprocessor and 64 KB RAM.

We evaluate two PPE schemes (AES and FNR) and two PHE schemes (ElGamal and Paillier), each implemented under different libraries, as explained in Section 8:

(1)

the NTV native implementation of AES,6 which is also used internally in the FNR cryptosystem;

(2)

the OpenSSL library;

(3)

the GMP library;

(4)

the BDS BigDigits library.

We use a 128-bit block for AES and a 2,048-bit modulus for Paillier and ElGamal.

Fig. 7.

Fig. 7. Encryption latency of various PPE (time in microseconds) and PHE (time in milliseconds) schemes across different IoT devices. y-axis in log scale.

Figure 7 shows the execution time for encrypting a random 128-bit string for AES and FNR and a random 32-bit int number for ElGamal and Paillier. We observe that if SSL is available, then AES and FNR encryption is very fast, taking only 19 us for AES and 36 us for FNR, on the most computationally constrained device, M3. If SSL is not available, then the much simpler but less performant NTV implementation can be used, which requires 270 us for AES and 1,137 us for FNR on an M3 device. As expected, encryption using ElGamal or Paillier is a more expensive operation, since these are asymmetric schemes. Yet in all IoT devices that support the GMP or the SSL libraries, ElGamal and Paillier exhibit decent performance. ElGamal encryption implemented using SSL takes 3.9 ms on 2xl, 169.6 ms on Pi3, 193.8 ms on Pi0, and 219.6 ms on A8. Corresponding times for ElGamal implemented using GMP are slightly higher. Similarly, Paillier encrypted using SSL takes 7.3 ms on 2xl, 331.5 ms on Pi3, 492.9 ms on Pi0, and 554.7 ms on A8. Due to lack of support for the GMP library and the BIGNUM primitive in the versions of SSL supported in the M3 device, ElGamal and Paillier cannot be implemented using GMP or SSL. Therefore, ElGamal and Paillier encryption on the M3 device implemented using the less optimized BigDigits library is impractical, requiring several seconds to complete. This justifies the need for the optimizations we propose in Section 5.

Fig. 8.

Fig. 8. Encryption latency of ElGamal and Paillier with PRN pre-computation across different IoT devices. y-axis in log scale.

Fig. 9.

Fig. 9. Encryption latency of ElGamal and Paillier with ciphertext packing across different IoT devices. y-axis in log scale.

As shown in Figures 8 and 9, our proposed optimizations improve the performance of ElGamal and Paillier across all IoT devices dramatically. With PRN pre-computation, ElGamal encryption implemented in GMP is slightly faster than SSL and takes only 0.4 us on a 2xl device, 5 us on Pi3, 18 us on Pi0, and 18 us on A8. Even in the worst case, ElGamal implemented using the less optimized BigDigits library takes only 1.7 ms on the most resource-constrained device, M3. PRN pre-computation has similar benefits for Paillier with the worst case of implementing Paillier using BigDigits taking 41.6 ms on an M3 device. The ciphertext packing optimization allows us to pack two int values in a single ElGamal ciphertext, which roughly cuts the encryption time in half (packing messages to a single plaintext itself takes minimal time). Packing is even more effective for Paillier, since we can pack up to 33 int items in a single Paillier ciphertext. The results demonstrate that with our proposed optimizations, the encryption latency of the cryptosystems used by C3PO is acceptable even in very resource-constrained devices.

9.2 Smart Meter Analytics

In this evaluation, we study the throughput of C3PO by running a set of analytical queries to analyze the electricity usage of homes. We use the Smart* dataset [4] as our input. This dataset represents electrical meter readings collected over a 24-hour period at the rate of one reading per minute from 443 unique homes, totaling 637,526 records. Each reading is a tuple of three fields: < timestamp, meter-id, meter-reading>. We define throughput as the number of tuples processed by the application graph in unit time. The runtime is configured to avoid dropping tuples by using a fixed-sized queue. When the queue becomes full, the source vertices stop emitting tuples. When slots in the queue free up, source vertices start emitting tuples again. To measure throughput, we implement the queries explained in IoTBench [2] for use in streaming systems.

Fig. 10.

Fig. 10. Smart meter analytics throughput.

Fig. 11.

Fig. 11. Heartbeat analysis response time.

We used a time window of 60 s and executed the queries for at least 600 s. We ran these queries on four m3.large nodes on Amazon EC2. For C3PO, one of the four nodes was specified as a trusted tier node. The bandwidth of the trusted node was throttled to 8 Mbit/s to simulate a wide area network link. The results of our evaluation are presented in Figure 10. Q1 simply counts the number of readings and performs at 96% throughput of plaintext stream. Queries Q2 to Q6 all perform Paillier additions and result in a throughput of 59% to 70% compared to Storm (plaintext) throughput. The results also show the effect of disabling field masking (C3PO-FM). For Q1, Q2, Q3, and Q5, we are able to mask one field, resulting in an average of 7% increase in throughput. Since queries Q4 and Q6 use all the fields in the stream, no fields could be masked.

9.3 Heartbeat Analysis

Next, we study how C3PO can be used for an online healthcare application like a heartbeat monitor. The end-user application runs on specialized hardware (the monitoring IoT device) and counts the number of heartbeats per minute. The monitoring device uses PRN pre-computation to efficiently encrypt this value and send it to the cloud for processing and storing. PRN pre-computation allows us to use IoT devices with as little computational capabilities as the M3 node described above. The graph running in the cloud keeps track of daily, weekly, monthly, and yearly statistics. We use post-encryption packing to pack these four values into a single ciphertext, thereby reducing the ciphertext size by 4. The end-user may request to see these statistics on their device, in which case the data is retrieved from the cloud, decrypted on the monitoring device, and shown to the end-user. The statistics are maintained by two vertices, a “per user” vertex (\(v_1\)) and an “all users” vertex (\(v_2\)). User statistics are distributed across the multiple instances of \(v_1\). \(v_1\) also emits a summary of its per-user statistics every minute that is grouped by week, month, or year by \(v_2\) to find the average value across all users. The client device emits a message every time the client requests to see a specific data point, in response to which, the requested values are retrieved. For this application, the most critical metric is the response time, i.e., the time a user has to wait after requesting to see a metric until the metric is displayed. We deploy \(v_1\) and \(v_2\) on three m3.medium nodes in EC2 and use a single end-user device deployed on an A8 device. We measure the response time as we increase the volume of incoming tuples to \(v_1\) and \(v_2\) by simulating additional end-user devices using an Apache Kafka queue. Each of these end-user devices, including the one deployed on the A8 node, emits one tuple per minute containing an encrypted timestamp and the encrypted number of heartbeats. The results of this evaluation are presented in Figure 11, where we compare the response time of C3PO, C3PO-PP, which denotes C3PO with post-encryption packing disabled, and Storm. The top part of each stacked column indicates decryption overhead. We observe that response times for C3PO when excluding decryption time are very close to the plaintext version for up to 1,500 client devices after which C3PO’s response time degrades due to the increased load of \(v_1\) and \(v_2\) compared to the plaintext stream running on Storm. Decryption is a significant source of overhead. In C3PO without packing, the end-user device receives and decrypts four 4,096-bit Paillier ciphertexts containing the daily, weekly, monthly, and yearly statistics. Packing reduces that to a single ciphertext, which leads to lower network overhead and significantly lower decryption time (~\(4\times\) lower).

9.4 Yahoo Streaming

Fig. 12.

Fig. 12. YSB latency.

We use the Yahoo Streaming Benchmark (YSB) to study the latency of C3PO. This benchmark simulates an advertising analytics use case with several advertising campaigns, each containing several advertisements. The benchmark reads various events from Apache Kafka, identifies the events relevant to the advertisement campaign, and stores a windowed count of relevant events per campaign. The steps in this analytical processing pipeline are as follows:

(1)

Read an event (in JSON format) from a Kafka queue.

(2)

Deserialize the JSON formatted event string into individual event fields.

(3)

Filter out irrelevant events, based on the event type.

(4)

Take a projection of the relevant fields keeping the ad ID and the event time.

(5)

Join each event with its associated campaign. This ad-to-campaign mapping information is stored in a Redis in-memory data store.

(6)

Take a windowed count of events per campaign and store each window in Redis along with a timestamp of the time the window was last updated in Redis.

The input data for this evaluation is generated using a clojure program that generates uniformly random tuples of the following format < user_id:UUID, page_id:UUID, ad_id:UUID, ad_type:String, event_type:String, event_time:Timestamp, ip_address:String\gt . These tuples are then sent to the Kafka queue. The results find the latency that a particular processing system produces at a given input load. The test computes the latency (in ms) from when the last event was emitted to Kafka for that particular campaign window and when it was fully processed. Figure 12 shows the results of running this benchmark on encrypted input data (C3PO) and plaintext data (Storm). The results show that on average C3PO operates with only \(23\%\) higher latency than running the same computation over plaintext data.

9.5 Linear Road

We use the popular Linear Road Benchmark (LRB) that models variable toll calculation for a city or county to assess C3PO’s deployment heuristic. LRB simulates vehicles traveling through an expressway with vehicles generating position reports at fixed time intervals. Position reports contain information such as the expressway ID, direction of travel, lane of travel, mile marker, offset within the mile, and so on. These position reports are processed by a toll levying agency to dynamically:

(1)

calculate the amount of toll to be levied on the vehicle and

(2)

identify accident locations to alert vehicles upstream of the accident.

LRB also specifies latency invariants such as the time within which a toll must be calculated and the time within which an accident has to be identified. The upper limit within which the system needs to report tolls and accidents is 5 s. The benchmark rates the system by the highest number of expressways (L) the system can support while maintaining these invariants. Figure 13(a) shows the Storm topology that implements the standard linear road, and Figure 13(b) shows the transformed C3PO topology. The latter topology contains two new vertices, \(v_6\) and \(v_7\), which are re-encryption vertices that are executed within the trusted tier. We ran the experiment for three hours. The rate at which position reports are emitted for one single expressway is shown in Figure 14. To test the system under both low and high loads, the rate of input is designed to steadily increase up to 1,811 tuples/s.

Fig. 13.

Fig. 13. LRB graph. Shaded nodes represent re-encryption vertices that are executed within the trusted tier.

Fig. 14.

Fig. 14. LRB data profile.

Fig. 15.

Fig. 15. Storm LRB baseline.

Fig. 16.

Fig. 16. C3PO LRB baseline.

LRB baseline and hypothesis validation. We first ran a baseline deployment of LRB by assigning each vertex a single task. This allows us to observe each individual vertex to see how they consume resources and verify the hypothesis made in Section 7.1 that bottlenecks change when running on encrypted data streams. We plot utilization as defined in Section 7 against time in Figure 15 for Storm with plaintext data and in Figure 16 for C3PO with encrypted data. We observe that in Storm vertices \(v_4\) and \(v_2\) have the highest utilization values until around the 8,000 s mark, and after that vertex \(v_1\) becomes the node with the highest load. This increase is because the number of tuples that require a toll notification increases substantially after 8,000 s. In the transformed C3PO graph running on encrypted streams, \(v_5\) and \(v_1\) come under high load until 8,000 s, and after that the bottleneck at \(v_1\) becomes yet more prominent. This validates our hypothesis that primary bottlenecks differ between graphs running on plaintext vs. encrypted streams.

Table 2.
SystemDeployment profileAverage response time (ms)# of expressways supported
Storm5, 4, 1, 3, 22,694.4420
C3PO5, 2, 1, 2, 3, 1, 12,672.9715
  • Deployment profile shows vertices \(v_1,\ldots ,v_5\) for Storm and \(v_1,\ldots ,v_7\) for C3PO.

Table 2. LRB Comparison

  • Deployment profile shows vertices \(v_1,\ldots ,v_5\) for Storm and \(v_1,\ldots ,v_7\) for C3PO.

Performance of C3PO deployment profile. Next, we benchmark both the Storm topology graph and the transformed C3PO graph for LRB. We deploy both graphs on 15 m3.large nodes in Amazon EC2 using the best possible configuration so the maximum number of highways supported can be identified. We show the results in Table 2. For plaintext streams Storm supports 20 expressways, while C3PO with encrypted streams supports 15 expressways. We also plot response times for all notification triggering tuples—times taken for notifications to be issued from the time respective tuples enter the system. The response times are shown in Figure 17 for Storm and in Figure 18 for C3PO. Response times for C3PO peak faster than Storm, but for 15 expressways C3PO is able to maintain the response time below the threshold allowed by the benchmark.

Fig. 17.

Fig. 17. Response time for LRB on Storm.

Fig. 18.

Fig. 18. Response time for LRB on C3PO.

Effectiveness of analytical model. The effectiveness of the model can be evaluated by looking at how well the model converts the deployment profile for the plaintext streams to the deployment profile for the encrypted streams in C3PO. Vertices with higher utilization values should get more instances to execute them. Table 3 shows the response time of C3PO deployment profile compared to other deployments. As can be seen, the deployment profile generated by C3PO results in the lowest response time. This profile is also in accordance with Figure 16, which shows vertices \(v_1\) and \(v_4\) should get the highest numbers of instances.

9.6 Multiple Groups

In this evaluation, we look at the effectiveness of using multiple key groups as outlined in Section 6. We evaluate the throughput (tuples/s) of a group by.. sum operation with a varying number of key groups. The devices emit a two-field tuple < group-name, value> and the application finds the total sum of values for each group for every one-minute time interval. A key rotation is initiated for one of the groups every two minutes. The devices in the group that is changing the keys will emit tuples encrypted in the old key and new key for a time span of one minute. As outlined in Section 6, when a key rotation is initiated, a special key change tuple is emitted. The C3PO worker node receives the key change notification and keeps track of partial results for the new key group until the next time window starts. Figure 19 shows the results. The system runs at a throughput of 1,763 tuples per second when no keys are changed. If an encryption key is compromised and we are to update the key without using multi-group mode,then we have to emit all input tuples under both the old and new key for the aggregation time window, reducing throughput to 622 tuples per second. Multi-group mode allows us to rotate the key of a specific group, reducing the impact of key rotations. In Figure 19, we can see that when using the multi-group mode, with the number of key groups increased to 10, throughput increases from 622 to 1,466. The throughput increases because the impact (the number of tuples to be emitted under both the old and the new key) of rotating a key is smaller. This shows that multi-group mode is an effective way of rotating encryption keys.

Table 3.
DeploymentAverage response
profiletime (ms)
5, 2, 1, 2, 3, 1, 1\(^a\)2,672.97
4, 2, 1, 4, 2, 1, 12,714.30
5, 4, 1, 3, 2, 1, 12,781.40
\(^a\)Deployment Profile Identified by C3PO.

Table 3. LRB Deployment Profile Response Time

Fig. 19.

Fig. 19. Effect of varying the number of key groups.

9.7 New York Taxi Statistics

The New York taxi statistics application finds the 10 most frequent routes during the last 30 minutes of taxi servicing. A route is represented by a starting grid cell and an ending grid cell. The data for this application is based on a publicly available taxi dataset released under FOIL (Freedom of Information Law). The input data contains the locations (latitude and longitude) of passenger pick-ups and drop-offs, MD5 digests of the medallions of the taxis that picked up the passengers, and the trip times. The dataset contains records that span over a year. The application emits an output tuple whenever there is a change in the top 10 values. We define response time as follows: Given a tuple t that causes the top-10 values to change from tuple \(top10_{t-1}\) to \(top10_t\), and a function \(T(x)\) that gives us the time at which tuple x is emitted, response time is \(T(top10_t) - T(t)\) To evaluate the effect of key changes on response time, we simulate a key change at the beginning of each month. This means that all data is emitted with a timestamp within the first 30 minutes of every month will be encrypted under both the old and new keys. We deployed this application on 10 m3.large nodes in Amazon EC2. Table 4 summarizes the results of these runs. We can see that C3PO with no key changes completes processing the data with only a 23.8% and 25.1% increase of completion time and average response time, respectively, compared to the Storm running on a plaintext stream. Furthermore, the increase in completion times and average response times caused by a monthly key change are minimal (about 1%). Figure 20 shows the response times for the full 10,000 s run to process a year-long data with key changes in input data every month. In this plot, we can see intermittent spikes (total of 12) in response time for some tuples around the time a key change is in progress, but the majority of tuples (90th percentile within 31 ms and 99th percentile within 818 ms) respond with the same response time as when no change was in effect.

Table 4.
SystemCompletion time (s)Average response time (ms)
Storm8,10636.05
C3PO\(^a\)10,03945.10
C3PO\(^b\)10,14046.61
  • The entire stream is emitted under same key.

  • In C3PO\(^b\)the stream is emitted with a new key every month.

Table 4. Top-10 Taxi Routes In C3PO\( ^a \)

  • The entire stream is emitted under same key.

  • In C3PO\(^b\)the stream is emitted with a new key every month.

Fig. 20.

Fig. 20. Response time of top-10 taxi route query with monthly key change.

Skip 10RELATED WORK Section

10 RELATED WORK

The advent of cloud computing has led to the need for highly scalable stream processing systems and has resulted in the next generation of stream processing systems such as Storm, Heron [34], Spark streaming [66], and Samza [40]. C3PO’s design is based on Storm, but our proposed concepts for preserving data confidentiality in the context of continuous query execution can be applied to other stream processing systems. In what follows, we overview related work in the area of IoT-based confidentiality-preserving stream processing.

10.1 Computing over Encrypted Data

In his seminal work, Gentry introduced an implementable FHE scheme [20] that has been becoming more practical since References [21, 37], but is still not suited for encryption-enabled continuous query processing due to its prohibitive cost. Instead, C3PO follows the approach of several related research works, focusing on using PHE and PPE schemes to perform computations over encrypted data. The OPE scheme used in C3PO is based on Boldyreva et al. [7]’s construction, the implementation of which is openly available. Newer OPE constructions exist that offer ideal security [47] and defend against ciphertext frequency analysis [32, 33]. C3PO’s current OPE scheme can be replaced with the implementation of the aforementioned schemes to support order comparisons. Similarly, the SWP [58] scheme C3PO uses for searching over encrypted data can be replaced with other constructions [26] that allow searching over encrypted data.

10.2 PHE-based Systems

CryptDB [48] is a database system focusing on executing SQL queries on encrypted data using PHE. CryptDB uses a proxy to intercept client queries and transform them into queries that operate over encrypted data. Crypsis [59, 60] is a runtime system built on Apache Pig, which analyzes and transforms data flow graphs to generate semantically equivalent graphs that are deployed in public clouds and executed over encrypted data. SecureScala [28] is a domain-specific language in Scala that allows expressing secure programs without requiring any cryptographic knowledge from the programmer. Cuttlefish [53] is another recent system that uses PHE. Cuttlefish is built on Spark and introduces Secure Data Types which allows an application programmer to specify intrinsic properties about the structure and constraints of data, which in turn enable a set of compilation techniques, making Cuttlefish generate more optimized queries. Seabed [45] introduces an additively symmetric homomorphic encryption scheme to perform aggregations on large encrypted datasets efficiently. Symmetria [52] is a PHE-based system built on top of Spark that introduces two symmetric PHE schemes, replacing the more computationally expensive asymmetric PHE schemes. JEDI [35], an end-to-end encryption scheme leverages the hierarchical resource structure of IoT systems to delegate keys in a decentralized manner across multi-hops. WKD-IBE and AES schemes and assembly-level optimizations are incorporated to support embedded IoT devices. Talos [57] builds on the capabilities of CryptDB and introduces cryptographic primitives that work on low-power devices. Pilatus [56] introduces an encrypted data-sharing scheme based on re- encryption, with revocation capabilities and in situ key updates. TimeCrypt [9] proposes an efficient encryption scheme based on additive symmetric homomorphic encryption for time series data by mapping the keys to time. Droplet [55] proposes a decentralized access control mechanism to access encrypted data present on the cloud using blockchain technology.

All the aforementioned systems are built around a storage system (e.g., CryptDB, Talos, and Pilatus are based on MySQL; Droplet stores state in an SQLite database; TimeCrypt uses a back-end based on Cassandra; and Crypsis, Cuttlefish, and Seabed are based on Spark batch processing). None of these systems considers streaming workloads as supported by C3PO and therefore do not address issues that arise in this setting, such as key management to limit the effect of key compromises or efficient cloud deployments for streaming applications, or computations that involve sliding windows with possible key changes in between.

10.3 Trusted Hardware-based Systems

Another way to enforce data confidentiality is through the use of specialized hardware that provides a trusted execution environment. An approach that is gaining popularity now is to use an Intel SGX7 enabled processor, which offers a trusted execution environment (so-called enclaves) in which data computations can be carried out confidentially. SGX offers hardware encrypted and integrity-protected physical memory, which allows data and code to reside in the untrusted cloud. SecureCloud [31] shows how SGX can be used to enable secure and private execution of big data applications in the cloud. Havet et al. [29] describe the design of SecureStreams, a streaming system that uses SGX to preserve confidentiality. SecureStreams use a Lua VM running inside SGX enclaves to capture worker and router components. Workers handle the application logic, and routers use a dispatching policy to handle message passing from one worker to another. Each component is then wrapped in Docker containers for isolation and ease of deployment. SecureStreams demonstrate that a streaming system using SGX is feasible. Instead, C3PO focuses on PHE and does not require specialized hardware. In addition, C3PO addresses challenges such as managing keys and allows for different deployments that improve performance. Trusted hardware approaches are orthogonal to the PHE-approach used by C3PO and could also be used to extend the design of C3PO by allowing secure computations to be performed on trusted hardware in an untrusted cloud. Cuttlefish shows the benefits of using SGX selectively when hitting the limits of PHE (for re-encryption).

Skip 11CONCLUSIONS Section

11 CONCLUSIONS

We presented C3PO, a practical distributed system for evaluating continuous queries over encrypted data streams in public clouds. C3PO makes computations over encrypted data practical for stream processing by using a novel API, encryption inference, automatic re-encryption, and a set of other original optimizations. The C3PO API allows programmers to develop secure applications with little or no knowledge of the underlying cryptosystems. We evaluated our approach using standard benchmarks and applications, demonstrating its applicability and performance. Our evaluations show that we can meet latency requirements even with high volumes of encrypted traffic.

Footnotes

  1. 1 https://www.smartthings.com.

    Footnote
  2. 2 https://www.nest.com.

    Footnote
  3. 3 In the Star Wars™ saga, C-3PO is a risk-averse droid with a strong need for security and stability.

    Footnote
  4. 4 http://storm.apache.org.

    Footnote
  5. 5 https://github.com/ssavvides/homomorphic-c.

    Footnote
  6. 6 https://github.com/kokke/tiny-AES-c.

  7. 7 https://software.intel.com/en-us/sgx.

REFERENCES

  1. [1] Arasu Arvind, Cherniack Mitch, Galvez Eduardo F., Maier David, Maskey Anurag, Ryvkina Esther, Stonebraker Michael, and Tibbetts Richard. 2004. Linear road: A stream data management benchmark. In International Conference on Very Large Data Bases (VLDB). 480491.Google ScholarGoogle Scholar
  2. [2] Arlitt Martin F., Marwah Manish, Bellala Gowtham, Shah Amip, Healey Jeff, and Vandiver Ben. 2015. IoTAbench: An internet of things analytics benchmark. In ACM/SPEC International Conference on Performance Engineering.133144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] ARM TrustZone 2020. Retrieved from https://developer.arm.com/ip-products/security-ip/trustzone.Google ScholarGoogle Scholar
  4. [4] Barker Sean, Mishra Aditya, Irwin David, Cecchet Emmanuel, Shenoy Prashant, and Albrecht Jeannie. 2012. Smart*: An open data set and tools for enabling research in sustainable homes. Workshop on Data Mining Applications in Sustainability 111, 112 (2012), 108.Google ScholarGoogle Scholar
  5. [5] Benaloh Josh. 1994. Dense probabilistic encryption. In Workshop on Selected Areas of Cryptography. 120128.Google ScholarGoogle Scholar
  6. [6] BigDigits Multiple Precision Arithmetic Library 2020. Retrieved from http://www.di-mgt.com.au/bigdigits.html.Google ScholarGoogle Scholar
  7. [7] Boldyreva Alexandra, Chenette Nathan, Lee Younho, and O’Neill Adam. 2009. Order-preserving symmetric encryption. In International Conference on Theory and Applications of Cryptographic Techniques (EUROCRYPT), Vol. 5479. 224241.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Bormann Carsten, Ersue Mehmet, and Keranen Ari. 2014. Terminology for Constrained-node Networks. RFC 7228. RFC Editor. Retrieved from https://www.rfc-editor.org/rfc/rfc7228.txt.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Burkhalter Lukas, Hithnawi Anwar, Viand Alexander, Shafagh Hossein, and Ratnasamy Sylvia. 2020. TimeCrypt: Encrypted data stream processing at scale with cryptographic access control. In Networked System Design and Implementation. (NSDI). 835850.Google ScholarGoogle Scholar
  10. [10] Capkun Srdjan, Buttyán Levente, and Hubaux Jean-Pierre. 2003. Self-organized public-key management for mobile ad hoc networks. IEEE Trans. Mob. Comput. 2, 1 (2003), 5264.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chintapalli Sanket, Dagit Derek, Evans Bobby, Farivar Reza, Graves Thomas, Holderbaugh Mark, Liu Zhuo, Nusbaum Kyle, Patil Kishorkumar, Peng Boyang, and Poulosky Paul. 2016. Benchmarking streaming computation engines: Storm, Flink and Spark streaming. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 17891792.Google ScholarGoogle Scholar
  12. [12] Cho Hyunghoon, Ippolito Daphne, and Yu Yun William. 2020. Contact tracing mobile apps for COVID-19: Privacy considerations and related trade-offs. CoRR abs/2003.11511 (2020).Google ScholarGoogle Scholar
  13. [13] Daemen Joan and Rijmen Vincent. 2002. The Design of Rijndael: —The Advanced Encryption Standard. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Damgård Ivan and Jurik Mads. 2001. A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In International Workshop on Public Key Cryptography. 119136.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Dara Sashank and Fluhrer Scott. 2014. FNR: Arbitrary length small domain block cipher proposal. In Conference on Security, Privacy, and Applied Cryptography Engineering (SPACE), Vol. 8804. 146154.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Dierks Tim and Rescorla Eric. 2008. The Transport Layer Security (TLS) Protocol. RFC 5246. RFC Editor. Retrieved from http://www.rfc-editor.org/rfc/rfc5246.txt.Google ScholarGoogle Scholar
  17. [17] Eugster Patrick, Kumar Seema, Savvides Savvas, and Stephen Julian James. 2019. Ensuring confidentiality in the cloud of things. IEEE Pervas. Comput. 18, 1 (2019), 1018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Gamal Taher El. 1985. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31, 4 (1985), 469472.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Ge Tingjian and Zdonik Stanley B.. 2007. Answering aggregation queries in a secure system model. In International Conference on Very Large Data Bases (VLDB). 519530.Google ScholarGoogle Scholar
  20. [20] Gentry Craig. 2009. Fully homomorphic encryption using ideal lattices. In Symposium on Theory of Computing (STOC). 169178.Google ScholarGoogle Scholar
  21. [21] Gentry Craig, Halevi Shai, and Smart Nigel P.. 2012. Homomorphic evaluation of the AES circuit. In Annual International Cryptology Conference (CRYPTO), Vol. 7417. 850867.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Giraud Matthieu, Anzala-Yamajako Alexandre, Bernard Olivier, and Lafourcade Pascal. 2017. Practical passive leakage-abuse attacks against symmetric searchable encryption. In International Joint Conference on e-Business and Telecommunications (ICETE). 200211.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Goldwasser Shafi and Micali Silvio. 1984. Probabilistic encryption. J. Comput. Syst. Sci. 28, 2 (1984), 270299.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Grubbs Paul, Sekniqi Kevin, Bindschaedler Vincent, Naveed Muhammad, and Ristenpart Thomas. 2017. Leakage-abuse attacks against order-revealing encryption. In IEEE Symposium on Security and Privacy (SP). 655672.Google ScholarGoogle Scholar
  25. [25] Gupta Trinabh, Singh Rayman Preet, Phanishayee Amar, Jung Jaeyeon, and Mahajan Ratul. 2014. Bolt: Data management for connected homes. In Conference on Networked Systems Design and Implementation (NSDI). 243256.Google ScholarGoogle Scholar
  26. [26] Hahn Florian and Kerschbaum Florian. 2014. Searchable encryption with secure and efficient updates. In International Conference on Computer and Communications Security (CCS). 310320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Halevi Shai and Rogaway Phillip. 2003. A tweakable enciphering mode. In Annual International Cryptology Conference (CRYPTO), Vol. 2729. 482499.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Hauck Markus, Savvides Savvas, Eugster Patrick, Mezini Mira, and Salvaneschi Guido. 2016. SecureScala: Scala embedding of secure computations. In Symposium on Scala (SCALA). 7584. Google ScholarGoogle Scholar
  29. [29] Havet Aurélien, Pires Rafael, Felber Pascal, Pasin Marcelo, Rouvoy Romain, and Schiavoni Valerio. 2017. SecureStreams: A reactive middleware framework for secure data stream processing. In International Conference on Distributed and Event-based Systems (DEBS). 124133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Identity Verification and End-to-End Encryption 2020. Retrieved from https://keybase.io/.Google ScholarGoogle Scholar
  31. [31] Kelbert Florian, Gregor Franz, Pires Rafael, Köpsell Stefan, Pasin Marcelo, Havet Aurelien, Schiavoni Valerio, Felber Pascal, Fetzer Christof, and Pietzuch Peter R.. 2017. SecureCloud: Secure big data processing in untrusted clouds. In Design, Automation & Test in Europe Conference & Exhibition (DATE). 282285.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Kerschbaum Florian. 2015. Frequency-hiding order-preserving encryption. In International Conference on Computer and Communications Security (CCS). 656667.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Kerschbaum Florian and Tueno Anselme. 2019. An efficiently searchable encrypted data structure for range queries. In European Symposium on Research in Computer Security. 344364.Google ScholarGoogle Scholar
  34. [34] Kulkarni Sanjeev, Bhagat Nikunj, Fu Maosong, Kedigehalli Vikas, Kellogg Christopher, Mittal Sailesh, Patel Jignesh M., Ramasamy Karthik, and Taneja Siddarth. 2015. Twitter heron: Stream processing at scale. In International Conference on the Management of Data (SIGMOD). 239250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Kumar Sam, Hu Yuncong, Andersen Michael P., Popa Raluca Ada, and Culler David E.. 2019. JEDI: Many-to-many end-to-end encryption and key delegation for IoT. In USENIX Security Conference. 15191536.Google ScholarGoogle Scholar
  36. [36] Li Fei, Vögler Michael, Claessens Markus, and Dustdar Schahram. 2013. Efficient and scalable IoT service delivery on cloud. In International Conference on Cloud Computing. IEEE Computer Society, 740747.Google ScholarGoogle Scholar
  37. [37] Martins Paulo, Sousa Leonel, and Mariano Artur. 2017. A survey on fully homomorphic encryption: An engineering perspective. ACM Comput. Surv. 50, 6 (Dec. 2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Modadugu Nagendra and Rescorla Eric. 2004. The design and implementation of datagram TLS. In Networks and Distributed Systems Security Symposium (NDSS).Google ScholarGoogle Scholar
  39. [39] Naveed Muhammad, Kamara Seny, and Wright Charles V.. 2015. Inference attacks on property-preserving encrypted databases. In International Conference on Computer and Communications Security (CCS). 644655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Noghabi Shadi A., Paramasivam Kartik, Pan Yi, Ramesh Navina, Bringhurst Jon, Gupta Indranil, and Campbell Roy H.. 2017. Samza: Stateful scalable stream processing at LinkedIn. Proc. VLDB Endow. 10, 12 (Aug. 2017), 16341645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Oliver Nuria and Flores-Mangas Fernando. 2006. HealthGear: A real-time wearable system for monitoring and analyzing physiological signals. In International Workshop on Wearable and Implantable Body Sensor Networks (BSN). 6164.Google ScholarGoogle Scholar
  42. [42] OpenSSL Multiple Precision Arithmetic Library 2020. Retrieved from https://www.openssl.org.Google ScholarGoogle Scholar
  43. [43] Paillier Pascal. 1999. Public-key cryptosystems based on composite degree residuosity classes. In International Conference on Theory and Applications of Cryptographic Techniques (EUROCRYPT). 223238.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Pandey Omkant and Rouselakis Yannis. 2012. Property preserving symmetric encryption. In International Conference on Theory and Applications of Cryptographic Techniques (EUROCRYPT), Vol. 7237. 375391.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Papadimitriou Antonis, Bhagwan Ranjita, Chandran Nishanth, Ramjee Ramachandran, Haeberlen Andreas, Singh Harmeet, Modi Abhishek, and Badrinarayanan Saikrishna. 2016. Big data analytics over encrypted datasets with Seabed. In Symposium on Operating Systems Design and Implementation (OSDI). 587602.Google ScholarGoogle Scholar
  46. [46] Poddar Rishabh, Boelter Tobias, and Popa Raluca Ada. 2019. Arx: An encrypted database using semantically secure encryption. Proc. VLDB Endow. 12, 11 (2019), 16641678.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Popa Raluca Ada, Li Frank H., and Zeldovich Nickolai. 2013. An ideal-security protocol for order-preserving encoding. In IEEE Symposium on Security and Privacy (SP). 463477.Google ScholarGoogle Scholar
  48. [48] Popa Raluca A., Redfield Catherine M. S., Zeldovich Nickolai, and Balakrishnan Hari. 2011. CryptDB: Protecting confidentiality with encrypted query processing. In Symposium on Operating Systems Principles (SOSP). 85100.Google ScholarGoogle Scholar
  49. [49] Public cloud security incident 2020. Retrieved from https://www.helpnetsecurity.com/2020/07/09/public-cloud-security-incident/.Google ScholarGoogle Scholar
  50. [50] Rivest Ronald L., Adleman Len, Dertouzos Michael L., et al. 1978. On data banks and privacy homomorphisms. Found. Sec. Comput. 4, 11 (1978), 169180.Google ScholarGoogle Scholar
  51. [51] Savvides Savvas. 2020. Practical Confidentiality-Preserving Data Analytics in Untrusted Clouds. DOI: https://doi.org/10.25394/PGS.12645440.v1Google ScholarGoogle Scholar
  52. [52] Savvides Savvas, Khandelwal Darshika, and Eugster Patrick. 2020. Efficient confidentiality-preserving data analytics over symmetrically encrypted datasets. Proc. VLDB Endow. 13, 8 (Apr. 2020), 12901303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Savvides Savvas, Stephen Julian James, Ardekani Masoud Saeida, Sundaram Vinaitheerthan, and Eugster Patrick. 2017. Secure data types: A simple abstraction for confidentiality-preserving data analytics. In Symposium on Cloud Computing (SoCC). 479492.Google ScholarGoogle Scholar
  54. [54] Securely Connect any Device 2020. Retrieved from https://www.zerotier.com/.Google ScholarGoogle Scholar
  55. [55] Shafagh Hossein, Burkhalter Lukas, Ratnasamy Sylvia, and Hithnawi Anwar. 2020. Droplet: Decentralized authorization and access control for encrypted data streams. In USENIX Security Conference. 24692486.Google ScholarGoogle Scholar
  56. [56] Shafagh Hossein, Hithnawi Anwar, Burkhalter Lukas, Fischli Pascal, and Duquennoy Simon. 2017. Secure sharing of partially homomorphic encrypted IoT data. In Conference on Embedded Networked Sensor Systems (SenSys). 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Shafagh Hossein, Hithnawi Anwar, Droescher Andreas, Duquennoy Simon, and Hu Wen. 2015. Talos: Encrypted query processing for the internet of things. In Conference on Embedded Networked Sensor Systems (SenSys). 197210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Song Dawn Xiaodong, Wagner David A., and Perrig Adrian. 2000. Practical techniques for searches on encrypted data. In IEEE Symposium on Security and Privacy (SP). 4455.Google ScholarGoogle Scholar
  59. [59] Stephen Julian James, Savvides Savvas, Seidel Russell, and Eugster Patrick. 2014. Practical confidentiality preserving big data analysis. In Workshop on Hot Topics in Cloud Computing (HotCloud). USENIX Association.Google ScholarGoogle Scholar
  60. [60] Stephen Julian James, Savvides Savvas, Seidel Russell, and Eugster Patrick Th.. 2014. Program analysis for secure big data processing. In International Conference on Automated Software Engineering (ASE). ACM, 277288.Google ScholarGoogle Scholar
  61. [61] Stephen Julian James, Savvides Savvas, Sundaram Vinaitheerthan, Ardekani Masoud Saeida, and Eugster Patrick. 2016. STYX: Stream processing with trustworthy cloud-based execution. In Symposium on Cloud Computing (SoCC). 348360.Google ScholarGoogle Scholar
  62. [62] The GNU Multiple Precision Arithmetic Library 2020. Retrieved from https://www.gmplib.org.Google ScholarGoogle Scholar
  63. [63] Whong Chris. 2014. FOILing NYC’s Taxi Trip Data. Retrieved from http://chriswhong.com/open-data/foil_nyc_taxi.Google ScholarGoogle Scholar
  64. [64] Yang Zhe, Zhou Qihao, Lei Lei, Zheng Kan, and Xiang Wei. 2016. An IoT-cloud based wearable ECG monitoring system for smart healthcare. J. Medical Syst. 40, 12 (2016), 286:1–286:11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Yu Zhen and Guan Yong. 2008. A key management scheme using deployment knowledge for wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 19, 10 (2008), 14111425.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Zaharia Matei, Das Tathagata, Li Haoyuan, Shenker Scott, and Stoica Ion. 2012. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In Workshop on Hot Topics in Cloud Computing (HotCloud).Google ScholarGoogle Scholar

Index Terms

  1. C3PO: Cloud-based Confidentiality-preserving Continuous Query Processing

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Privacy and Security
            ACM Transactions on Privacy and Security  Volume 25, Issue 1
            February 2022
            219 pages
            ISSN:2471-2566
            EISSN:2471-2574
            DOI:10.1145/3485162
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 November 2021
            • Accepted: 1 June 2021
            • Revised: 1 May 2021
            • Received: 1 May 2020
            Published in tops Volume 25, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)566
            • Downloads (Last 6 weeks)110

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!