Poster: StreamGuard: Enabling Secure and Uncensored Video Calls on Mobile Devices

With the increasing use of application-based video calls, the security and privacy of video and audio data have become a major concern. Despite the use of encryption, applications that implement the encryption can still access the content of video calls for eavesdropping, censorship, and data mining. To address this issue, we propose a lightweight system service solution, StreamGuard, to provide End-to-End Encryption (E2EE) at mobile system level, effectively blocking applications from accessing unencrypted data. StreamGuard uses the Signal Protocol for key exchange, and AES for encryption. Additionally, StreamGuard's novel architecture supports video preview and editing while maintaining the confidentiality of data. Our preliminary results show that StreamGuard is feasible with minimal performance overhead.

Despite the sensitive nature of video and audio calls, their security and privacy are often overlooked.A 2016 study in Voice over IP (VoIP) traffic [1] found that numerous well-known messaging applications, such as ICQ and WeChat, only encode voice data instead of encrypting it.Unencrypted traffic leaks personal identifiable information (PII) [2], which can be used by adversaries to track users.
To address this issue, the Secure Real-Time Transport Protocol (SRTP) encrypts audio and video data in transit between mobile devices and servers, and End-to-end encryption (E2EE) provides even stronger protection by encrypting the data on the sender's device and decrypting it only on the receiver's device.However, because SRTP and E2EE are implemented at application level, application providers can still access the contents of communication, allowing for potential eavesdropping, censorship [3], and data mining [4].
To enforce E2EE at the mobile system level, we introduce our preliminary work on StreamGuard, a lightweight system service solution that encrypts video calls before video and audio data streams are sent to applications.As shown in Figure 1, StreamGuard is an Android system service extension that performs E2EE key exchange with a remote party, securely encrypts video and audio data, and sends the encrypted data to applications for transmission.Through this work, we aim to safeguard the confidentiality and integrity of video calls against potentially malicious applications, their providers, and network adversaries.Because data are encrypted before reaching the application, StreamGuard protects video and audio streams from being intercepted, censored, or otherwise tampered by adversaries.StreamGuard's novel architecture further supports video preview and editing, which allows applications to display unencrypted video previews and apply video filters, such as background blur, without revealing the data to applications.
StreamGuard aims to provide strong confidentiality and integrity guarantees using the Signal Protocol [5] for key exchange between contacts, and AES encryption for data streams.

Threat Model
In our threat model, mobile operating system, including kernel and system services, and an identity server are trusted.We assume applications, their developers, and network, to be potentially malicious, as they may collect, leak, or tamper with data.

Design
In this section, we discuss the proposed design of StreamGuard.We present a high-level overview of the architecture of StreamGuard, and describe the design decisions in supporting video preview, editing, and key exchange.

StreamGuard Service Overview
At the core of StreamGuard, we propose a lightweight Android system service extension to the existing media service.It performs E2EE key exchange, securely encrypts video and audio streams using AES encryption, and provides the encrypted data to applications through APIs.We envision that StreamGuard extends the existing media service APIs with additional permissions for unencrypted camera and microphone access.Users can selectively grant unencrypted access only to trusted applications.For untrusted applications, the data are encrypted with a session-by-session key managed by StreamGuard.The encryption process is managed by system services and is out of reach of applications.

Preview and Editing
To maintain the functionality of video call applications, we propose a novel architecture for StreamGuard to support video preview and editing.As the video data are encrypted before reaching applications, the applications cannot directly access the video data for preview and editing.We propose to revise the PreviewView API that allows unencrypted camera streaming to be displayed on the screen without revealing the data to the application.For editing, we propose a video filter API, namely JVF (Just a Video Filter), which allows applications to offload video editing tasks to StreamGuard before the data are encrypted.The JVF API provides a set of video filters, such as color correction and background blurring, in complementary to the existing Android CaptureRequest API.We provide an example of JVF pseudo-code that greys out the background, sets focus, and adds a balloon picture supplied by application.

Key Exchange
StreamGuard's design benefits from the Extended Triple Diffie-Hellman (X3DH) key exchange protocol used in the Signal Protocol [5] for key exchange at the beginning of each video call.Identity verification in X3DH relies on either pre-shared secrets or a trusted identity server.We propose to use the existing Android account system as the identity server, which associates a verified phone number or email address with a public key.When video call application dials a contact, StreamGuard queries the identity server to retrieve the X3DH pre-key of the contact, and performs key exchange to establish a secure session.

Preliminary Results
We present preliminary results to show the feasibility of Stream-Guard.Through a cryptographic benchmark, we measure the AES-256-GCM speed on a Google Pixel tablet released in 2023.The device achieves 27.23 Gbps, which is sufficient for 4K video streaming at 85 Mbps.We also measure the expected latency introduced by X3DH key exchange in a local setup.It takes 0.150 ± 0.002 ms to establish a session key, which is negligible for video calls.However, we expect latency to increase with a remote X3DH server due to network latency.As StreamGuard will use the same Android Inter-Process Communication (IPC) and memory-sharing mechanisms (namely, Binder and Ashmem) as utilized in the existing Android media services, StreamGuard does not incur additional inter-process communication overhead.

Figure 1 :
Figure 1: Architecture of StreamGuard.Purple indicates encrypted data, yellow indicates protected preview, and blue indicates StreamGuard services.StreamGuard allows a video call application to install JVF (Just a Video Filter) for preencryption video editing.