IoTFlow: Inferring IoT Device Behavior at Scale through Static Mobile Companion App Analysis

The number of "smart'' devices, that is, devices making up the Internet of Things (IoT), is steadily growing. They suffer from vulnerabilities just as other software and hardware. Automated analysis techniques can detect and address weaknesses before attackers can misuse them. Applying existing techniques or developing new approaches that are sufficiently general is challenging though. Contrary to other platforms, the IoT ecosystem features various software and hardware architectures. We introduce IoTFlow, a new static analysis approach for IoT devices that leverages their mobile companion apps to address the diversity and scalability challenges. IoTFlow combines Value Set Analysis (VSA) with more general data-flow analysis to automatically reconstruct and derive how companion apps communicate with IoT devices and remote cloud-based backends, what data they receive or send, and with whom they share it. To foster future work and reproducibility, our IoTFlow implementation is open source. We analyze 9,889 manually verified companion apps with IoTFlow to understand and characterize the current state of security and privacy in the IoT ecosystem, which also demonstrates the utility of IoTFlow. We compare how these IoT apps differ from 947 popular general-purpose apps in their local network communication, the protocols they use, and who they communicate with. Moreover, we investigate how the results of IoTFlow compare to dynamic analysis, with manual and automated interaction, of 13 IoT devices when paired and used with their companion apps. Overall, utilizing IoTFlow, we discover various IoT security and privacy issues, such as abandoned domains, hard-coded credentials, expired certificates, and sensitive personal information being shared.


INTRODUCTION
The number of Internet of Things (IoT) devices, that is, smart devices, is rising rapidly: Forecasts expect the number of IoT devices to grow to 25.4 billion in 2030 [45].These devices collect data about their users and environment to make smart decisions.For example, to call for help in an emergency, a smartwatch may collect health indicators.This means that users need to trust them to handle their data with care.Unfortunately, smart devices have gained notoriety for their security and privacy issues, leading to the catchphrase "the S in IoT stands for security." Notably, employees of Ring had unauthorized access to users' security camera footages uploaded to their cloud backend [54].Similarly, the European Union (EU) recalled kids' smartwatches because they exposed sensitive information and could be easily compromised by attackers [16].
Prior work extensively analyzed open and closed source desktop and mobile applications (apps) for security and privacy issues, but analyzing smart devices remains an open challenge.Related work in this domain mainly focused on firmware vulnerabilities [21,27] or on analyzing a handful of selected devices [23,44,60,81,91,96].This does, however, not scale to the wide variety of smart devices with diverse software and hardware architectures.Intuitively, buying thousands of devices to analyze them in a lab setting is financially and practically infeasible.
Therefore, to enable the large-scale discovery and analysis of security and privacy issues in the IoT ecosystem, we propose IoTFlow, a novel static analysis approach for IoT devices via their mobile companion apps.These apps play an important role in controlling IoT devices directly and can serve as intermediaries to their cloud backends.Practically all IoT devices have such apps available for Android and iOS [21,62,63,80].They allow users to setup and control their devices locally, via the local network or Bluetooth, or remotely, via the Internet.For some devices, their apps are the only gateway to the Internet.Overall, the apps store and process information collected by the IoT devices and about the remote infrastructure.Given the nature of data that the devices collect and use, it may also be highly sensitive.Further, attackers could misuse apps with hard-coded information (e.g., endpoints, credentials) to eavesdrop on others' private information, or distribute malicious content via misconfigured IoT backends or abandoned domains.Using a misconfigured backend, they could exploit vulnerabilities to create a new botnet of hundreds of thousands of devices, even if the devices are not directly reachable on the Internet.
The basic idea of evaluating the security and privacy of IoT devices indirectly by studying their companion apps has been explored by prior work.For example, Wang et al. [94] leveraged it to identify rebranded devices by searching for similar apps.They find vulnerabilities in other devices because of "private labeling" and component re-use.Chen et al. [21] and Redini et al. [80] used companion apps to inform fuzzing IoT devices, while Zuo et al. [102], Sivakumaran et al. [85,86], and Zhao et al. [99] leveraged companion apps to identify Bluetooth Low Energy (BLE) issues.Wang et al. [93] statically analyzed Samsung SmartThings apps, which are part of the SmartThings smart hub IoT ecosystem.
Existing approaches focus on re-identifying already known issues shared among multiple devices (previously discovered through traditional techniques), still require physical devices (fuzzing), focus on a subset of companion apps (BLE), or analyze conceptually simple apps that are less widespread than general companion apps [64] (e.g., Samsung SmartThings apps, which are event flow graphs, rather than full apps; similar to "If This Then That" [47]).
In this paper, we introduce a new static analysis approach, IoT-Flow, that substantially advances this basic idea.Our new approach enables us to gain new fundamental knowledge about the companion apps and corresponding smart devices at scale without actually requiring the physical device.We focus on addressing two crucial limitations of state-of-the-art techniques: First, we discover new issues automatically instead of re-identifying existing issues, which would require a priori knowledge that they exist.Second, we investigate individual devices instead of assuming that groups of devices share or re-use components.Specifically, our approach enables us to infer and gain new insights into the security and privacy of companion apps and their corresponding smart devices by reconstructing information about the used network protocols, endpoints, and the data they receive.With our approach, we can answer the following important but open questions concerning security and privacy in the IoT ecosystem: RQ1: How do companion apps and devices communicate?RQ2: Who are companion apps communicating with?RQ3: Which data are companion apps sharing (and how)?Specifically, our approach (1) identifies communication trigger points, (2) uses Value Set Analysis (VSA) to reconstruct networkrelated information on where data is coming from or transferred to, such as the URLs that are being contacted, (3) utilizes Data-flow Analysis (DFA) to determine what data is being accessed, shared, and with whom, and (4) assesses the corresponding impact.
We evaluate our approach on 9,889 unique and manually verified companion apps [50,62,63] to show that we can analyze IoT devices accurately and at scale.Additionally, we study the differences in network behavior between the companion apps and 947 popular general-purpose apps that we collected.Finally, we verify the accuracy of IoTFlow and compare it with dynamic analysis, for which we interacted with 13 IoT devices via their companion apps.In this paper, we make the following contributions: • We introduce IoTFlow, a new static program analysis approach utilizing Value Set Analysis (VSA) and Data-flow Analysis (DFA) to analyze the behavior of IoT devices based on their companion apps' interactions with them and their remote backend.• We show that IoTFlow can accurately infer the network behavior of companion apps at scale by analyzing 9,889 IoT apps.
• We analyze how and with whom companion apps communicate, what data they share locally with devices and remotely, and we highlight their differences to general apps.• Using IoTFlow, we automatically discover rampant security and privacy issues in the IoT ecosystem, such as abandoned control domains, hard-coded credentials, expired certificates, or shared Personally Identifiable Information (PII).Artifacts.To foster reproducibility and future research, we make our open source implementation and analysis artifacts available at https://github.com/SecPriv/iotflow.

MOTIVATION
Following, we motivate IoTFlow with the need for at-scale IoT device behavior analysis, the interdependence of companion apps and IoT infrastructure, and the unique features of companion apps compared to general-purpose apps.
Large-scale IoT Device Behavior Analysis.The plethora of security and privacy issues that supposedly plague smart devices are a well-hypothesized problem in the security community and often anecdotally confirmed when yet another real-world issue is found and the press is reporting on it.Unfortunately, we currently lack techniques to discover such issues and also other vulnerabilities in smart devices automatically and at scale.State-of-the-art approaches focus on analyzing the devices' firmware [27], requiring tedious and substantial manual effort to tailor it to each individual device, possibly even each hardware revision of a device.It also suffers from the many challenges of analyzing firmware, such as having to deduce and infer what sensors and actuators exist, model them, and understand how the firmware is communicating with it.Even if it would be feasible to scale such approaches to the many devices, it is also challenging to automatically gather thousands of firmware images, as devices use different processes to retrieve and update their firmware.At the same time, more and more IoT devices are being manufactured and used.Thus, it remains an open problem how to analyze the increasing number of diverse devices.
For large open source projects, the average lifetime of vulnerabilities is multiple years [1,55].Considering the profit-driven nature of the IoT ecosystem, it appears likely that security is indeed an afterthought in the IoT ecosystem and vulnerabilities might remain unpatched similarly long or even longer.Automated large-scale analysis allows us to promptly identify vulnerabilities and mitigate them.Moreover, even when automated analysis cannot replace indepth analysis, it still helps developers to identify issues and address them.Being able to accurately analyze how IoT devices truly behave also informs privacy policy and behavior of (privacy-conscious) consumers.Practical large-scale automated analysis provides the much-needed foundation and knowledge to better understand IoT devices and improve their security and privacy.
IoT Control Infrastructure.The fundamental idea of smart devices is that they coordinate and cooperate with other devices, that is, they do not work in isolation.Typically, the devices communicate with companion apps, smart hubs, or remote cloud-based backends (see Figure 1), the latter of which may distributed over different regions world-wide [83].Users interact with the devices almost exclusively via their companion apps.If a device supports Wi-Fi, then the app may communicate with the device over the local network or the Internet.If a device does not support Wi-Fi but only uses Bluetooth, then all device-to-cloud communication needs to pass through the app or a hub.Moreover, due to missing user interfaces, updating a device's firmware frequently happens via the app [94].That is, the apps play a central role during the setup, operation, and update of the devices.In fact, many devices cannot be set up without using a device that can run the app.Thus, apps must contain some information about the devices and their behavior, and they provide a unique analysis opportunity.
General-purpose Apps vs. Companion Apps.Compared to generalpurpose apps, companion apps face different challenges and introduce new threats.Generally, mobile operating systems restrict access to sensitive data and sensors (e.g., through Android or iOS permissions).However, this does not apply to data collected through smart devices.Users also lack visibility and control over the data the devices collect and share.It is crucial to investigate the threat of collusion between device and app, especially because it circumvents existing defenses and allows to build more accurate user profiles by combining PII and data collected by both [82].
Advertisements (ads) and trackers to collect user data for behavioral targeting appear widely in general-purpose apps [79,82,90].These services are attractive for developers to monetize their apps [41].For companion apps, one might assume that the business model centers around selling the devices.However, related studies showed that these apps and even devices themselves include ads and tracking [60,81,91].In hindsight, considering the IoT environment and collusion potential, this makes sense: It is additional income.For example, companion apps can interact with the local network to discover and manage devices (a permission often required to set up the device), which is data general-purpose apps have difficulty to collect, and which is also useful for advertisement or tracking [52].Prior work on network behavior and PII leakage of apps mainly considers traffic sent to remote servers.For IoT devices that use local communication, via Bluetooth or Wi-Fi, app-to-device or deviceto-app communication has additional significance [87].A smart device only using Bluetooth can collude with a companion app to "clean" sensitive data: receive it, encode it in some way, and send it back to the app, which sends it to the tracker.Existing ways to identify and block such behavior in general-purpose apps cannot address the challenges of the IoT environment, like collusion.

IOTFLOW
We introduce IoTFlow, a new static analysis approach for companion apps.We aim to better understand the behavior of IoT devices without requiring the physical device.
IoTFlow itself has two main phases (see Figure 2): Value Set Analysis (VSA) and Data-flow Analysis (DFA).With VSA, we identify trigger points, that is, sources and sinks of interesting (network) activities.This appears trivial at first, but it is important to realize that (1) we expect a substantial amount of communication, as smart devices are meant to communicate and coordinate extensively, and (2) we need to be able to determine the communication endpoint.For example, a user might expect and accept that the companion app shares their location to turn on their heating when they are on their way home.But, most users would likely object if it is sent to an advertisement company.Enumerating all potential sources and sinks will lead to inaccurate results and render the analysis impractical.Instead, we need to distinguish where apps send data, to the device or a remote service, to which services, and utilizing which network protocols.We accurately reconstruct this information leveraging VSA (Section 3.1) and use it to identify precise sources and sinks for our DFA (Section 3.2).
For reconstructed endpoints, which may be third-party services, we then (1) categorize them based on their purpose, (2) analyze their geographic locations, and (3) test for abandoned domains.This allow us to evaluate if communication would be expected and assess their security and privacy impact.For example, a privacy-conscious user within the EU may not expect that their device sends data to a country not bound to the GDPR.Similarly, abandoned domains can lead to devices being taken over by attackers [17,24,74].
With DFA, we can then precisely assess which data companion apps share, with whom they communicate, and how.Specifically, we analyze the data-flow for data from the identified and categorized trigger points as well as from sensitive data sources (e.g., GPS location) to relevant sinks.

Motivating Example. Considering the examples in Listing 1 and
Listing 2, we (1) need to reconstruct the destination of the Message Queuing Telemetry Transport (MQTT) broker (Listing 2, line 15), and (2) trace the data flow from the Bluetooth source (Listing 1, line 3) to where the message is published (Listing 2, line 18).An additional challenge is that the data is passed from Listing 1 line 6 to Listing 2 line 7 via Inter Component Communication (ICC).Traditional approaches would miss this example.However, we can reconstruct the keys of the ICC during VSA and then bridge the connection via the reconstructed keys, enabling us to perform more precise DFA across the ICC boundary.

Value Set Analysis
Value Set Analysis (VSA) is a program analysis technique to reconstruct values at specific program points.We utilize it to gain insights about the communication of IoT apps and to accurately handle ICC for our DFA.VSA has been used by related work before [101,102], however, with the focus on reconstructing Application Programming Interface (API) keys or Universally Unique Identifiers (UUIDs) of BLE to identify vulnerable implementations of the BLE pairing process.That is, related work reconstructed primarily strings using manually derived rules, while IoTFlow supports arbitrary objects (as is required to precisely reconstruct endpoints, like in Listing 2).
Pre-Processing ➊.We implemented our IoTFlow prototype in Java and target Android.We use Soot [89] to parse Dalvik byte code from Android apps.It translates the byte code into the Jimple Intermediate Representation (IR), which simplifies our analysis (e.g., by splitting nested instructions).Notably, both Kotlin and Java Android apps are compiled into Dalvik code, and, in turn, IoTFlow can readily analyze both types of apps.In preparation for the forward computation step, we also translate the Dalvik byte code into Java byte code with dex2jar [71] because Java cannot load classes directly from the Dalvik byte code.Identification of Sinks ➋.IoTFlow starts at interesting sinks tracing backward their values.We analyze network-related sinks from Android, Java, and 19 manually selected popular network communication libraries, focusing on IoT application layer messaging protocols (e.g., MQTT, Constrained Application Protocol (CoAP), Advanced Message Queuing Protocol (AMQP), and Extensible Messaging and Presence Protocol (XMPP)) [11,70].Additionally, we consider ICC and cryptographic methods as sinks.We later use the reconstructed ICC information to bridge the ICC boundary during DFA.As apps might encrypt data before sending it, we also examine cryptographic methods.Backward Tracing ➌.We then trace back through the program, starting at the identified sinks to all program points where the app modifies the values we are interested in.Naturally, this yields an over-approximate trace set.For example, if we want to reconstruct the parameter passed to MqttManager in Listing 2, then our reconstruction starts at line 15 (following ).We trace back the value of config.endpoint to line 14, to line 8, to lines 1-3, until we have traced all variables on which config depends.
Forward Simulation ➍.In the next step, we reconstruct the actual value set.Here, we must reconstruct arbitrary objects passed to the sinks, or we would miss the value of config in our example.That is, only reconstructing string operations is insufficient.Instead, we adapt our value reconstruction to handle arbitrary objects from any classes defined by the app, such as the MqttConfig class.We utilize reflection and forward simulate the backward trace, using the classes and methods as the app would do while normally executing it.Using reflection for simulating execution paths has a further advantage: We can handle code where the app itself uses reflection, which prior work cannot.However, reflection also introduces new challenges that we need to address:      , via blue statements.The data flow from ICC source to sink, completing the flow from source to ICC sink of Listing 1, is marked , through green statements.We highlight the reconstructed ICC key "device" yellow again.
(1) Android Methods.Some data might not be available statically, like user input.Additionally, we cannot simulate Android methods with reflection because only stub implementations are available and we use placeholders instead (e.g., intents, shared preferences, and database).(2) Non-Terminating Methods.Simulating arbitrary methods with reflection can also lead to non-termination, such as when it waits for an IoT device to connect.We mitigate this issue by terminating it after two seconds.We determined this threshold empirically as a trade-off between precision and time.In practice, most instructions finish within a fraction of a second.(3) Partially Reconstructed Values.Partially reconstructed values can cause us to miss values.We may simulate a substring operation, but the analysis does not reconstruct the whole base string because parts depend on dynamic values that we cannot determine statically.This can then result in an out-of-bounds exception, which would cause us to miss more values.For example, if a URL obtained dynamically would contain a 32 character device serial number, but our placeholder is from_pref, then the analysis may cause an out-of-bounds exception if it accesses index 9.We mitigate this issue by preempting the calls that can cause such issues and expand the value on demand.Notably, this is not limited to string operations, but also extends to accessing arbitrary member fields of objects.For missing parameters or base objects, we attempt to create them with their default constructors.For primitive data types (e.g., boolean, int, or float), we assign default values.
Local vs. Remote Endpoints ➎.A unique aspect of companion apps is that their communication can be local, to connect to IoT devices or hubs, or remote, to connect to remote backends.We need to distinguish these classes to answer what data they share, how, with whom, and what the security and privacy impact is.Therefore, we categorize endpoints as certainly local or possibly remote.That is, we identify local connections by checking whether a reconstructed endpoint points to a local IP address, a broadcast address, a multicast address, or the domain originates from user input (fromUI.local).We consider all other endpoints as remote.

Data-Flow Analysis
In the second phase of IoTFlow, we use DFA to trace data flows from IoT devices and sensitive Android methods.IoTFlow builds on FlowDroid [12], which is a data-flow framework for Android.We extended it to address the unique challenges of the IoT ecosystem.We (1) connect reconstructed endpoint information to data sent or received, similar to pointer analysis, and (2) trace flows across ICC.
Considering how modern apps work internally, we must pay particular attention to ICC.It is now the recommended way for app components to communicate with each other and often used, which is why tracing data flow across it is crucial.Theoretically, FlowDroid supports ICC via ICCTA [56].However, ICCTA cannot generate ICC models for current Android apps [66,98], which prevents FlowDroid from tracing flows through ICC.IoTFlow addresses this blind spot by treating ICC as sources and sinks, and connecting an ICC sink (writing to a key) to the corresponding ICC sources (reading from the key) by reconstructing the key used in ICC through VSA.
Connecting Reconstructions ➏.After reconstructing network endpoint information with VSA, we must connect them to the points where the app adds data to the request objects or receives a response, as these might be different from where the endpoint is set.For example the endpoint might be set during initialization of a connection object that is later used (repeatedly) to send or receive data.We identify the points where the app receives data and use the receiving statement as communication trigger points.Similarly, we need to connect a request's destination with the request's data when the request is executed.We do so using multiple data-flow analysis runs, which we split by method type for easier parallelization (e.g., MQTT, UDP, or CoAP).Returning to Listing 2, we previously reconstructed the MQTT broker endpoint via VSA (line 15).For our DFA in the next step, we now associate the MQTT broker endpoint (line 15) to the sink publishString (line 18) (marked ).
Direct Data Flows (Source to Sink) ➐.We are interested in data flows from sources that are (1) Bluetooth, (2) responses from the local network, or (3) sensitive Android methods.We trace them to (1) ICC sinks and (2) remote sinks, that is, data leaks. 1 Bluetooth data is interesting as it may contain data from smart devices and local network communication is likely data from smart devices.
Crucially, we need to treat flows to and from the same method differently depending on the context and how the app uses the 1 Full list of sources and sinks: https://github.com/SecPriv/iotflow/tree/main/configmethod (e.g., we want to analyze local network responses but ignore responses from remote endpoints).Thus, we extended FlowDroid to support context-sensitive flow analysis.We precisely identified the methods and the context that we need to consider as trigger points with the help of our VSA and by Connecting Reconstructions ➏, which we can utilize to understand potential data leaks.
We focus first on three types of straight-forward immediate flows: (1) Bluetooth to network, (2) local network to network, and (3) sensitive data to network.Additionally, we trace sources to ICC sinks, to analyze flows across ICC, giving us three more flow types: (4) Bluetooth to ICC, (5) local network to ICC, and (6) sensitive data to ICC.Considering our example Listing 1, here, IoTFlow identifies the flow (marked ) from the Bluetooth source bcg in line 3 via line 5 to line 6, where the data value is passed to the intent using the key BLE_DATA (reconstructed via VSA, marked ).
Indirect Data Flows (Source to ICC to Sink) ➑.Finally, we need to follow up on the flows we identified that have an ICC sink, to properly bridge the ICC boundary.We trace the additional flow type (7) ICC source to network sink, and then precisely connect the new flows with previously identified flows of types ( 4)-( 6).This allows us to discover and analyze data leaks involving ICC.For our examples Listing 1 and Listing 2, based on Direct Data Flows ➐, we identified a flow from Bluetooth to ICC using the key BLE_DATA.In Listing 2, using our indirect flow analysis, we now identify the flow (marked ) from the ICC source getStringExtra() in line 7 to line 9 to line 17 to line 18, where the app sends the Bluetooth data to the MQTT broker.Last, we connect the new ICC to network flow to the previously identified Bluetooth to ICC flow leveraging the VSA reconstructed ICC keys, giving us the indirect data flow that crosses the ICC boundary from Bluetooth to ICC to network.

INSIGHTS INTO THE IOT ECOSYSTEM
We evaluate IoTFlow on 10,836 apps on an Ubuntu 20.04.6 machine with 48 physical CPU cores (96 cores with hyper-threading, 2x Intel(R) Xeon(R) Gold 6342 CPU) and 1,024 GiB RAM.We limit the memory for the analysis of each app to 150 GiB (-Xmx150g).

Dataset
Verified Companion Apps.We analyze IoTFlow on 9,889 unique IoT companion apps that were verified manually by prior work as part of three individual datasets [50,62,63].We refer to our consolidated dataset as IoT-VER.It contains 455 apps collected by Neupane et al. [63] for studying if apps follow best practices, 5,100 apps that Jin et al. [50] used for the training, validation, and testing of IoTSpotter, and 6,208 apps that Nan et al. [62] collected and manually verified for IoTProfiler.Three quarters of the IoTProfiler apps are from the Google Play Store (74.6%), the remaining apps are from third-party stores.We did not augment these datasets with additional apps to not fragment the IoT companion app dataset space, which we deem important for reproducibility.Unfortunately, the public IoTSpotter dataset is incomplete and it misses 128 apps.Neupane et al. 's dataset misses two apps for which only the package name is available.We excluded these apps from our dataset.
All three datasets have 118 apps in common.IoTSpotter and IoTProfiler share 1,430 apps.The dataset of Neupane et al. shares 57 apps with IoTSpotter and 21 apps with IoTProfiler.If multiple datasets contain the same app, we only analyze the most recent version, that is, the app with the highest version code, since it is monotonically increasing [7].Our consolidated dataset IoT-VER contains 9,889 apps, unique by their package names.
Popular General-purpose Apps.We also downloaded 1,000 popular apps and games from the top selling free category of the Google Play Store in January 2022, which we use to illustrate the differences between IoT companion apps and other apps.We manually removed companion apps from the dataset and refer to the remaining 947 apps as GP-2022.To do so, two researchers independently classified each app based on its metadata in the Google Play Store.If they disagreed, they studied it in-depth until they reached an agreement.

Performance
We first discuss the performance of IoTFlow on our datasets (see Table 1).In addition to the total run time, we investigate the required time separately for VSA and DFA.On average, general apps take almost five times as long to analyze as companion apps (125m31s vs. 26m23s).This difference is even more pronounced when considering the median (129m36s vs. 6m51s): The processing time for companion apps is almost 20x faster than for general apps.Reasons may be the larger code base of general-purpose apps or that they tend to have more sources and sinks that we need to consider.Overall, we consider a median analysis time of less than 7 minutes and an average analysis time of approximately 26 minutes practical.
VSA Performance.We allow up to 600 backward traces for each identified statement to prevent long-running analyses.Increasing the number of backward traces typically leads to more combinations of the same data, like request parameters.Each backward trace has up to 300 steps.We determined these thresholds empirically, observing a reasonable trade-off between resources and precision.Additionally, we configure timeouts for backward tracing (15 minutes) and forward computation (20 minutes).Our analysis only triggered the backward timeout when analyzing 11 (0.1%, all from GP-2022) apps and the forward timeout for 304 (2.81%; 155, 1.57% IoT-VER and 149, 15.73% GP-2022) apps, which we consider reasonable.Higher thresholds could lead to more flows being found.
Data-Flow Performance.For DFA, we increased the timeout suggestions by the FlowDroid authors [14] by 50%.We set the Flow-Droid callback collection timeout to 7m30s and the timeout for flow analysis to 15m.Our analysis triggered the callback timeout for 2,432 apps (22.44%) and the flow analysis timeout for 3,004 apps (27.72%).Separating the two datasets, 1,847 companion apps (18.68%) and 585 general-purpose apps (61.77%) triggered the callback timeout, while 2,484 companion apps (25.12%) and 520 generalpurpose apps (54.91%) triggered the flow analysis timeout.

How Companion Apps Communicate
To answer RQ1: How do companion apps and devices communicate?, we identify device-to-app communication and the involved network protocols, and we study certificate pinning.as fromUI.local),and Bluetooth permissions.The latter indicates that the devices themselves might not have Wi-Fi capabilities, but that they use the companion app as a gateway to access the Internet.Some devices may also spawn their own Wi-Fi network that the phone needs to join for pairing.Within the network, the device has a fixed address known by the companion app.The apps can also use broadcasts to discover devices in local networks, for example apps use Universal Plug and Play (UPnP) to find devices that support screen mirroring.A fourth method is asking the user directly.Table 2 summarizes our findings.
General-purpose Apps.We observe a significant lower number for all four direct device communication indicators for general-purpose apps in GP-2022.We find local IP addresses in only 2.21% of apps, compared to 14.99% in IoT-VER.Similarly, broadcast and multicast addresses drop from 4.57% in IoT-VER to 0.42% in GP-2022.Only one (0.11%) address depends on user input in GP-2022, compared to 123 (1.24%) addresses in IoT-VER.The number of apps requesting Bluetooth permissions also decreased from 64.26% in IoT-VER to 19.01% in GP-2022.These findings strengthen our assumption that our direct device communication indicators are indeed meaningful.
Takeaways.We identified four strategies apps use to communicate locally with smart devices, and we show by comparing them to general-purpose apps that they are indeed specific to companion apps.Identifying this kind of communication helps security and privacy analyses (see Section 4.5.2).Prompting the user for the device location and using multicast can be dangerous and prone to misconfigurations.Users might make devices unwittingly accessible over the Internet [18].A Shodan [84] query for open port 554 returns 78,858 results of exposed cameras, suggesting that misconfigured devices accessible remotely are a common issue.Attackers can also sniff broadcast packages or mimic the legitimate device to act as a Monkey-in-the-Middle (MITM) [29].Finally, we note that any information about local network devices is sensitive and can be abused for advertising and tracking purposes [52].
We recommend to use device discovery and avoid requiring user configurable addresses, to reduce the risk of accidental misconfigurations [30].Apps should also respect users' privacy and not send local network information to remote servers.In fact, they should prefer local communication over cloud communication whenever possible, as remote requests can reveal usage patterns to others.

URL Protocol Schemes.
We identify network protocols based on the values we reconstructed through VSA.First, we analyze the URL schemes of the endpoints that apps communicate.Second, as we reconstruct endpoint information for libraries for AMQP, MQTT and XMPP communication, we can draw conclusions about them, even if they do not use specific schemes.Table 3 summarizes our results.The row IoT-related summarizes the schemes and protocols that are tailored to IoT devices.We group IPP, IPPs, RMTP, and VNC as IoT-other as we found them only in one or two apps, and we group protocols from IANA's list of URI schemes [48] that are less interesting for our use case (e.g., service, about, info) as Other.Overall, for IoT-VER, we reconstructed schemes in 7,113 unique apps for remote endpoints and in 871 apps for local communication.

HTTP(S)
. We find that apps still widely use plain HTTP.Our numbers represent an upper bound as we do not know how many actual connections occur over HTTP since we base our results on statically reconstructed endpoints.In practice, HTTP might be upgraded by default, but even if used as a fallback, HTTP can lead to security and privacy issues through protocol downgrade attacks.
Our results show that a high proportion of HTTP traffic, compared to HTTPS traffic, is for local communication.This is not surprising: Local communication might appear safe, and deploying TLS properly for IoT devices remains challenging [72].Nevertheless, even if communication is local, TLS protects against eavesdroppers, which is important as devices use broadcast media like Wi-Fi.
MQTT Endpoints.Smart devices have unique usage scenarios and requirements, such as device-to-device communication and energy efficiency.Traditional communication protocols do not satisfy these requirements.New protocols can fit these demands, but they can also threaten security and privacy, especially if they were designed without considering an adversarial environment or if developers make wrong assumptions about them.One protocol used often by IoT devices is the Message Queuing Telemetry Transport (MQTT) protocol.In practice, it often lacks authentication and authorization, allowing attackers to access user data or take over devices [49,97].MQTT is the most widespread IoT-specific communication protocol for IoT-VER.We reconstructed 147 MQTT endpoints in 176 apps, of which nine represent local IP addresses.We verify that the remaining 138 remote endpoints are indeed valid by opening a connection to them.To not raise any ethics concerns, we only open and immediately close the connection, and we do not perform any action (e.g., subscribing to a topic).We use the Python Paho library [31] for our test and base our results on the return code: If the connection is successfully established (return code 0) or an error related to connection parameters is returned (return codes 1 to 5), we consider the endpoint as reachable and valid.
We connected successfully (return code 0) to 74 MQTT endpoints (53.62%).To further investigate the remaining 64 endpoints, we probed for other ports typically used for MQTT (1883 and 8883) with nmap [59].Seven endpoints were closed and 37 were filtered, meaning our connection attempts were prevented at the network level.One reason may be geographical restrictions.The remaining 20 endpoints were unresponsive to ICMP echo requests and we consider them unreachable.
MQTT Credentials.IoTFlow can also reconstruct authentication credentials.Hard-coding credentials into the app can lead to attacks on the integrity and confidentiality of data by allowing an attacker to connect and publish or subscribe to topics (e.g., modifying a parameter of a physical actuator).We reconstructed 30 unique usernames and 34 unique passwords in IoT companion apps.
MQTT Topics and Payloads.Our analysis can reconstruct the topics (i.e., topics for which the phone or IoT device should receive messages) and message payload formats (i.e., the format of messages shared between phone, IoT device, and the cloud).We found 726 topic names and 330 payload formats.While we may miss dynamic values from communication with the device, the information we gain is valuable to understand the behavior of IoT apps and devices.
Other IoT Protocols.We also identified other IoT protocols in IoT-VER apps, namely XMPP, AMQP, and CoAP.Among the 36 XMPP endpoints we identified, we could connect to five, the port was filtered for six, and the remaining 25 endpoints were unresponsive to ICMP echo requests.For the six identified AMQP endpoints, we could connect to two, we received an authentication error for one, and the AMQP-specific port was closed or filtered for the remaining endpoints.For the two CoAP endpoints, one was a local IP address, but we successfully reconstructed 55 unique URL paths used to specify the location of resources on the server.
General-purpose Apps.For GP-2022, we reconstructed local addresses only in combination with HTTP, HTTPS, and WS (Web-Socket).Like for companion apps, only a minority uses HTTPS locally (14.29%).However, unlike IoT apps, nearly all apps (98.55%) use it for remote communication.Unsurprisingly, general apps do not use IoT protocols.Only a card game app uses MQTT and XMPP.
Takeaways.We found widespread adoption of HTTP across IoT-VER apps despite its insecurity.For GP-2022 apps, the situation improves as almost all apps communicate over HTTPS.However, in both datasets, most local communication does not adopt TLS to secure the connection.We also identified how IoT-specific protocols (MQTT, AMQP, XMPP, CoAP) are being used and we reconstructed crucial information, like credentials and topics.
Generally, apps should not use hard-coded credentials but generate them individually during initialization, use limited and narrow authorization scopes, follow best practices (e.g., encrypting Android shared preferences [8]), and encrypt all communication (e.g., via TLS, but preferably end-to-end).

Pinning and
Certificates.An additional aspect of how companion apps secure their communication is certificate pinning.It is a contentious topic: While OWASP [92] suggests it when the app wants to verify the host's identity, Google [3] advises not to adopt it because of issues deriving from certificate changes.However, determining whether it is good or bad is out of scope of our work.
We use the approach by Pradeep et al. [76] to identify pinning and the corresponding certificates by analyzing the Network Security Configuration (NSC) specified in the Android Manifest and the certificates included in the app.Table 4 summarizes our results.
IoT-Verified.More companion apps include certificates (12.21%) than use pinning (3.89%).On average, each app includes 3.21 certificates.More than half of the certificates in IoT-VER were self-signed, possibly to communicate with IoT devices.We also investigate if certificates were expired when the apps were downloaded.If the download date is unknown, we infer it based on the app versions.Our numbers are lower bounds for the apps from IoTProfiler and Neupane et al. because we assume apps were downloaded on the first day of the year when they could have been downloaded.We treat certificates as expired if their expiration date is before 2018 General-purpose Apps.Compared to IoT-VER, more apps adopt pinning (11.72% vs. 3.89%), but the same proportion of apps include certificates (12.57% vs. 12.21%).On average, however, they include less certificates (1.94 vs. 3.21).One reason may be the lower number of self-signed certificates.While more than half of all (57.59%) certificates are self-signed for companion apps, only slightly more than one third (37.23%) of certificates are self-signed for generalpurpose apps.At the download date, 14.59% certificates were already expired.IoT-VER's older download date could be a reason for the increase in expired certificates in May 2023 (29.18% vs. 17.47%).
Takeaways.Comparing included, expired, and self-signed certificates, we can conclude that more certificates do not lead to better security.Companion apps include substantially more certificates than general-purpose apps, and proportionally significantly more of them are expired or self-signed.Interestingly, fewer companion apps adopt the controversial practice of certificate pinning.
Generally, developers should renew certificates well before expiration, as users may not install updates immediately.Further care is needed for self-signed certificates as apps must add code to explicitly trust them, or Android will prevent the communication that attempts to use them.Worse, doing so incorrectly, like instructing TrustManager to trust every certificate, enables MITM attacks [3].

With Whom IoT Apps Communicate
After analyzing how apps communicate, we investigate RQ2: Who are companion apps communicating with?We categorize the reconstructed fully-qualified domain names (FQDNs) and effective top-level domains+1 (eTLD+1) to spot potentially problematic endpoints, like trackers, investigate where data is sent geographically, and analyze if endpoints are vulnerable to domain takeovers.General-purpose Apps.The average number of advertisement and tracker FQDNs per general-purpose app is 6.33, eight times higher than per companion app (0.76).Additionally, they occur in almost all apps (89.55%), while they only occur in less than one third (29.92%) of companion apps.The situation for analytics FQDNs is similar (71.70% vs. 16.65%).Most analytics and crash reporting FQDNs are shared between the two datasets, while FQDNs from other categories are mainly limited to one dataset.
Takeaways.The large number of 7,248 Other FQDNs in companion apps combined with the low number of 271 FQDNs shared with general-purpose apps (3.25% of all IoT FQDNs) suggests that many are IoT-specific.Prior work observed a low coverage of existing filter lists for IoT domains [60,88], highlighting the need for more scrutiny by future work into who receives data by these apps.Prior work showed that users value IoT security and privacy [32] and are willing to pay a premium for devices that respect their security and privacy [33].Indeed, not using ad services or trackers could be a unique and convincing differentiating value proposition for IoT devices, especially because users already pay for the device.

Geographic
Location.Next, we determined the location of the reconstructed FQDNs to study where data is sent and which countries receive IoT data.We first resolved the FQDNs to determine the location of the IPv4 addresses against the allocated blocks [34].We resolved them from Vienna, Austria, which is in a jurisdiction that has implemented the EU's General Data Protection Regulation (GDPR) [35].Notably, due to geographic split horizon DNS (GeoDNS), the resolved IP addresses may differ for other vantage points.Table 6 shows aggregated geographic regions.We Takeaways.The scattered geographic location of endpoints might raise privacy concerns.Countries have implemented various data protection regulations with stricter or more relaxed requirements.For example, the EU's GDPR [35] is considered the world's strongest privacy law.If a European user downloads an app that contacts endpoints outside the EU, their data is subject to GDPR, but the app may transfer it to foreign countries and process it there.This clearly raises privacy concerns and may even be illegal.Moreover, even if no sensitive data is sent directly, metadata can suffice to infer usage patterns, which can be sensitive (e.g., for smart locks).

Abandoned Domains
. Domains that could be re-registered but are still used pose severe security and privacy risks for users as attackers could take them over.A similar argument applies to domains that are registered, but for which DNS information is stale and where the corresponding IP address could be taken over [17].We focus on expired domains as they provide longerterm capabilities to attackers.We extract the eTLD+1 from the reconstructed FQDNs to identify abandoned domains.We then resolve the eTLD+1 to test whether they are in use.For domains we cannot resolve, we use WHOIS to check if it is registered or free.
IoT-Verified.We identified 136 potentially abandoned domains in companion apps.After manually investigating and removing artifacts, we verified that 67 domains from 73 apps are indeed available for registration.They are in apps for watches, TVs, cars, health equipment, security and baby cameras, lights, and locks.An attacker could take over these devices by registering the domains.
We also investigated if the 73 apps can still be downloaded from the Google Play Store.Unavailable apps remain critical, but differently so.They can still impact users as the devices might not have been replaced and they might still connect to those domains.We found that 27 apps (37.0%) are available.Remarkably, one app has over one million downloads, a second app has over 500,000, and three others have more than 100,000.For ten apps, based on the reconstruction information, it is likely the domains receive IoT data.They use IoT information in URLs, such as ipcDeviceIdList as a request parameter, or petinfoDatas/addpet as a path.Eight apps use abandoned domains to download files, which may be executed or could be device updates.Sixteen domains are API endpoints and also likely receive sensitive data.We responsibly disclosed our findings to developers and the Google Play Store.
Takeaways.Pariwono et al. [73] investigated abandoned domains for general apps, but the dangers can be more serious for IoT devices.Attackers could not only take over the apps and receive PII, but they might also be able to control hundreds of thousands of devices, enabling large-scale distributed denial-of-service attacks and allowing them to create botnets.Our analysis shows (1) that abandoned domains are a real danger in the IoT ecosystem, (2) that they affect a varied range of devices, and ( 3) what data they receive.
Developers should actively monitor the domains that their apps may contact, including those of third-party libraries.Additionally, old or deprecated domains that may still be contacted should remain registered, as users may depend on outdated app versions, and made inoperable instead of allowing others to register it.

What Data Companion Apps Share
To answer RQ3: Which data are companion apps sharing (and how)?, we first report what data apps can access, based on the requested permissions, to understand what data they could share.We then analyze the data flows we extracted to identify leaked data and whether encryption is used to protect data.4.5.1 Permissions.We extract permissions and protectionLevel with Androguard [28].We focus on permissions with a protectionLevel of dangerous (permissions protecting sensitive resources) and privileged (permissions that third-party apps should not adopt).
Additionally, 2,604 (26.33%)IoT apps request one or more privileged permissions, while only 73 (7.71%) GP-2022 apps do.Even if system permissions are requested, they will only be granted if the phone is rooted or if the app has a special entitlement (e.g., the phone vendor may grant such an entitlement to their own apps, and they might also produce IoT devices).They can also occur for backward compatibility reasons or be remnants from development that were never removed (e.g., the second most common privileged permission is READ_LOGS, which appears in 998 IoT apps).
Finally, 5,660 (57.24%)IoT apps use "non-standard" permissions.The permission occurring the most belongs to Google Cloud Messaging (GCM) (3,656 apps, 36.97%) and is used when receiving a broadcast from GCM.We also find permissions of specific brands, for example, Huawei (60 permissions occur 1,207 times in 424 apps) or Sony (31 permissions 787 times in 424 apps).
Takeaways.General-purpose apps request more permissions than companion apps on average.However, IoT apps use more privileged and dangerous permissions, with two of the most requested dangerous permissions being for the user's geographic location.
We recommend to regularly review if permissions are still current, to request the least necessary set of permissions, and to only temporarily acquire them when needed.With new Android updates, permissions might also change, for example, scanning for Bluetooth devices required location permissions only up to Android 12 [6,42].

Data Flows.
To learn more about what data is sent, we analyze the flows we discovered via DFA.Table 7 summarize our findings.We distinguish between three flow types based on the destination: Bluetooth, local network, or a sensitive Android API.We determine where the data is sent by connecting the VSA results with the individual flows.Unfortunately, we may not have precise information for all flows for two reasons: (1) VSA might not reconstruct an endpoint precisely, for example, because it depends on dynamic values, (2) we could not connect reconstruction and flow.
We identified data flows from Bluetooth and local network sources only for IoT apps, which is not surprising, as we have shown that such communications are companion app specific (see Section 4.3.1).Overall, we found 579 flows from Bluetooth sources in 90 apps.Remarkably, 497 (85.84%) of these flows involve ICC, which highlights the need for DFA that is ICC aware, like our approach.We precisely identified endpoint information (i.e., where the data is sent to) for 50 (8.64%)flows.For local network sources, we discovered 75 flows, of which four (5.33%) involve ICC.IoTFlow reconstructed precise endpoint information for 49 (65.33%) of them.Finally, we identified 6,706 flows from sensitive Android API in 1,682 (17.01%)IoT apps, and 1,366 such flows in 318 (33.58%)GP-2022 apps.
Case Study: Smart Grill.Our analysis finds a flow in a companion app for smart grills.The app reads data from the device via BLE, parses it, process it via an intent, and later sends it to an Amazon AWS endpoint via MQTT.We successfully connected to the endpoint without requiring credentials (anonymously) (see Section 4.3.2).This means that we could potentially receive data from others (for ethical reasons, we did not explore this further).
Case Study: Smart Camera.In a smart camera companion app, we found a flow from getDeviceId to a remote endpoint.The app uses the IMEI together with a username and password for authentication.Worth mentioning is also that the app hashes the password using MD5, which is insecure and cryptographically broken.Afterward, the app encrypts the username and password with 3DES, which is also insecure and cryptographically broken.IoTFlow's VSA reconstructed the key, even though the app developers put one byte Table 7: Flow Analysis.We separated the flows by their categories.The ICC-Flow columns represent the flows involving any ICC, and the endpoint columns the flows with additional endpoint information.The ratio concerns the number of flows from the category.The app columns show the number of apps with the respective flows and the relation to the apps in the dataset. of the key into a different class file, potentially trying to obfuscate it and avoid regex-based key recovery.Sharing the IMEI is problematic because users can only change the IMEI by physically replacing the device as it is a non-resettable hardware identifier.Google strongly discourages developers from using any hardware identifiers, including the IMEI [5], and it is also prohibited by Android's user data policy [40].With Android 10 (API level 29, released in 2019), Google added additional restrictions to access the IMEI [5,9], but around 14.4% of users are still using older versions, allowing apps to access these identifiers [15].
Geographic Location.We also analyze the geographic location of data flows with endpoint information.For IoT-VER, we find 917 (15.04%) flows sending data to Chinese and 604 (9.90%) to US endpoints.The share of flows with US endpoints increases for general-purpose apps (75, 27.88%), while the share for Chinese drops (12,4.46%),aligned with their distribution (see Section 4.4.2).Positively, as we conducted our experiments from the EU, most destinations are within the EU: 73.80% for IoT-VER and 67.66% for GP.Unfortunately, it also means that more than 25% of destinations are outside the EU, potentially violating GDPR.The situation is worse for Bluetooth-based sources than it is for local network flows.For Bluetooth, 27 endpoints (45.76%) are US endpoints, and for local network flows, 37 endpoints (52.86%) are Chinese endpoints.Our artifact provides more details. 2   Takeaways.With the help of IoTFlow's combination of VSA and DFA, we identified real-world security and privacy issues in the IoT ecosystem and we discussed what data companion apps leak and where they send it, which we illustrated with two examples.
Following best practices and for privacy reasons, developers should minimize data they collect and use, and only send data if it is truly necessary.Generally, we recommend to process as much data as possible locally, and to encrypt any data leaving the devices.

Encryption
Analysis.Finally, we analyze the encryption algorithms apps use and we investigate the reconstructed data passed to cryptographic methods.We reconstructed the algorithms in 812 (85.74%)GP-2022 apps and in 4,069 (41.15%)IoT-VER apps.Table 8 summarizes our results.AES is the most widely used encryption algorithm for IoT apps (92.97%) and GP-2022 apps (99.38%).Algorithms that are considered insecure or cryptographically broken are much more prominent in IoT apps (1,461 apps, 35.80%) using encryption than they are in GP-2022 apps (135, 16.63%).
We also evaluate reconstructed encryption keys.Unfortunately, removing false positive artifacts is extremely challenging because it is difficult to determine whether a key is truly used as is.For example, a 16-byte array with all 0 values could be an insecure 2 https://github.com/SecPriv/iotflow/tree/main/scripts/evaluation/dfaTakeaways.The main differences between IoT apps and GP-2022 apps are in what encryption they use and how they use them.Using hard-coded keys and broken encryption algorithms gives a false sense of security and does not provide security or privacy.Unfortunately, both issues are worse for companion apps.
Beyond using strong encryption algorithms, we also recommend to initialize encryption keys on demand and to store them securely, for example, with the help of Android KeyStore [4].

IOTFLOW VS. DYNAMIC ANALYSIS
Our static analysis approach has some limitations, especially because we do not require access to the device.It is crucial to understand what and how much information we can truly reconstruct from companion apps without the device.We verify the accuracy and completeness of the reconstructed values by analyzing and comparing the results we obtained through IoTFlow with our indepth manual analysis when interacting with apps and devices.To this end, we recorded traffic when using 13 different devices and their companion apps in our lab environment (see Table 9).
Our test environment uses a machine running Ubuntu 20.04 with frida-tools [78] and mitmproxy [26], and a rooted Google Pixel 4 running frida-server [78] on Android 12.The machine hosts a Wi-Fi network to which the phone and the devices connect, providing Internet connectivity through an Ethernet connection.We bypass certificate pinning via Frida's built-in scripting.We test companion apps with two strategies: First, automatic inputs, we test apps with the Application Exerciser Monkey (AME) [10] for 10 minutes or until they crash, whichever occurs first.We do not  [22] and we include it for completeness.Second, manual inputs, we manually interact with each app for 30 minutes and trigger all functionalities, including pairing and interacting with IoT devices, changing their settings, etc.Indeed, we observed significantly less traffic with AME than with manual interaction, demonstrating the scalability issues of dynamic analysis.
From the observed traffic, we extract requests' domain names, which correspond to who receive data, and resource paths, which correspond to functionality (e.g., API endpoints).We match them based on exact string equivalence exactly between IoTFlow and dynamic analysis.Considering the configuration of our dynamic environment, we also manually match domains and paths by (a) identifying over-approximate placeholders, such as the device product code, serial number, etc. and matching them to concrete dynamic information, (b) generalizing the dynamic system configuration, like language and locale, (c) grouping repeated dynamic data (as they also do not provide new information in the dynamic analysis setting, but provide a false sense of accuracy), and (d) resolving networklevel redirects (e.g., DNS-based or IP anycast).We remove analysis artifacts that are clearly not related: (a) domains and resources that were requested from outside of the IoT app, such as by the Android operating system (e.g., background update checks), (b) domains and resources that were requested by Android WebView components unrelated to device behavior (e.g., opening a vendor's online shop website), and (c) invalid domain names.We retain all data that cannot be clearly attributed to dynamic analysis artifacts, making our results a lower-bound.Last, as we focus on IoT-related behavior, we manually label data as related if it relates to IoT device behavior, security, privacy, or data exchange.Our artifact provides further details on the identified domains and paths, and their matching. 3 IoTFlow extracted 214 domains from the 13 companion apps, with a minimum of 3 domains, an average of 16.46 domains, and a maximum of 42 domains per app.With dynamic analysis, we observed 218 domains, with a minimum of 1 domain, an average of 16.77 domains, and a maximum of 48 domains per app.Between static analysis and dynamic analysis, 36 domains match exactly and we matched 7 additional domains manually. 3https://github.com/SecPriv/iotflow/tree/main/dynamic_analysisWe categorize all domains using our previous approach (see Section 4.4).Table 5 summarizes our results and our artifact provides further details. 3 Notably, we find substantially more advertisement domains via dynamic analysis than through IoTFlow.This is expected because of how modern ads are targeted and auctioned, requiring dynamic information.IoTFlow only recovers the entry point for ads, but this is actually sufficient to determine that they are used.It also highlights an important issue: Considering all domains gives a false sense of accuracy toward dynamic analysis, many of which may not provide new insight.For example, while it confirms our findings of extensive tracking in IoT apps, the IKEA companion app contacts 48 domains in total, but it also contacted 20 advertising domains and four social network domains.
Focusing on certainly IoT domains, IoTFlow and dynamic analysis share 21 domains across all apps (min 0, avg 1.62, max 5), IoT-Flow identified 33 domains that dynamic analysis missed (0/2.54/10), and dynamic analysis found 19 unique domains (0/1.46/4).That is, IoTFlow performs better than or equal to dynamic analysis for 9/13 devices and worse for only 4/13 devices (Fitbit smart watch, Hama light bulb, Soundcore headphones, and Wiz light bulb).For 2/4 of these apps, Fitbit and Wiz, IoTFlow correctly identifies the effective TLD of all domains we observed dynamically, that is, the operator, but it missed some subdomains.For Hama, it misses four IoT endpoints that we saw dynamically, likely because the device is a rebranded IoT device.For Soundcore, it misses one dynamically generated domain pointing to the device's most recent firmware.
Beyond domains, we also compare requests' paths.It allows us to assess which approach is more promising to comprehensively understand IoT device behavior, meaning if one discovers more IoTrelated functionality or if they identify distinct (overlapping) sets of behavior.Both approaches identified the same 50 IoT-related paths over all 13 apps (min 0, avg 3.85, max 17).We statically identified an additional 231 IoT-related paths (min 0, avg 17.77, max 45) and 496 general paths (min 2, avg 38.15, max 77).Dynamic analysis found 110 unique IoT-related paths (min 0, avg 8.46, max 32) and 337 general paths (min 1, avg 25.92, max 54).For three apps (Fitbit, Hue, and Wiz), our static analysis performs worse.Fitbit and Wiz use annotations to construct paths, which we cannot analyze, a limitation we share with state of the art (see Section 6).For Hue, our approach extracts 2 IoT-related path, while we observe 3 paths dynamically.IoTFlow performs better or equal for 10/13 apps, with a factor of at least 1.14x (Divoom, 41 vs. 36) and up to 31x (Flowercare, 31 vs. 1).For the IKEA app, dynamic analysis did not find any IoT-related paths, while IoTFlow found 45 paths.
Overall, IoTFlow performs better than dynamic analysis and extracts more IoT-related behavior statically from companion apps than dynamic analysis (54 domains and 281 paths vs. 40 domains and 160 paths) for most apps (9/13), it performs comparable for one app, and it performs slightly worse for the remaining apps (3/13).
IoTFlow Findings.Taking an in-depth look into IoTFlow's security and privacy findings for the 13 apps, we find that: • 8/13 apps send information via unencrypted HTTP to third parties, which an attacker could eavesdrop on or modify (e.g., if they are on the network path or the same wireless network).If unencrypted data is used to configure or update the device, then taking over control could be possible [68].The NUT Find3 item tracker retrieves notifications over unencrypted HTTP, which can allow an attacker to modify a user's notification (e.g., to show that a lost item was found and where or that it has moved away).• 5/13 apps use hard-coded symmetric encryption keys (e.g., for AES), which allows attackers to eavesdrop on their communication and can allow them to impersonate the remote end (e.g., to push configurations or updates by extracting the keys from the companion app) [67, 69].• 2/13 apps send the hardware identifiers (IMEI) to countries outside of the EU, that is, outside of the GDPR region (one might send it to Russia and one to China to a remote endpoint that indicates tracking), using an API that is deprecated (see Section 4.5.2). • 5/13 apps, while less critical, use country-level location information and send this to remote endpoints.• No apps use hard-coded authentication credentials, but this does not imply that they are secure because they might not use any authentication at all.

LIMITATIONS AND FUTURE WORK
IoTFlow has limitations inherent to static analyses.Additionally, we utilize the existing frameworks Soot and FlowDroid, and we inherit their limitations.For example, our resilience to obfuscation is limited, which can affect signature-based identification of sources and sinks.We find that they are only a minor share for companion apps (2.66% obfuscated).However, they are more prevalent for general-purpose apps (10.81% obfuscated) based on an APKiD [37] analysis of our datasets.Obfuscation is also an orthogonal problem, and new deobfuscation techniques can readily be adopted.Similarly, we focus our analysis on the Dalvik bytecode of apps, that is, we do not support native code.We currently do not consider code annotations, which the retrofit library uses to specify request paths and network methods.Both techniques are infrequently used, and not tackling them is a limitation we share with prior work, as existing frameworks struggle to support them, and the required engineering effort to support them is substantial.Our DFA supports ICC, but our VSA does not.We plan to extend ICC support to VSA in future work.Only 2.01% of reconstructed values contain ICC data, which does not invalidate our results.Currently, we limit ICC tracking to the same app, but theoretically, ICC can cross app boundaries or come from websites via deep links, providing further avenues for collusion.
For our analysis, we limit the number of backward steps and set a timeout, which trades between precision and resources but could lead to missing values and flows.We empirically determined our thresholds and other limits could yield more precise results.
Motivated by our results on certificate pinning and abandoned domains, we aim to study how companion apps evolve over time.Naturally, identifying network endpoints, protocols, and APIs is only the first step toward truly understanding the security and privacy of device-to-cloud communication in the IoT ecosystem.

RELATED WORK
Following, we compare IoTFlow to related work in IoT security, IoT companion app analysis, and general-purpose app analysis.
IoT Security.Prior work in IoT security largely focused on identifying attacks on a small set of devices.Wood et al. [96] investigated how medical IoT devices communicate and transmit data, and they found them leaking information, like the measurement frequency, despite using encryption.Chu et al. [23] discovered kids' devices sending PII over unencrypted connections.Other work [13,20,38,65,93] focused on Samsung's SmartThings apps.SmartThings is a smart hub ecosystem unifying the control of compatible devices and allows event flow graphs, which are conceptually simple [64].In contrast, IoTFlow analyzes arbitrary Android apps, which are more widespread and significantly more complex.Correspondingly, their techniques do not readily transfer to the entire IoT ecosystem.Related work also investigated the ecosystem via crowd-sourced network traffic collection [46] or telemetry data [53] of real-world user devices, which raises ethical and anonymization challenges.
IoT Companion App Analysis.Different work investigated IoT companion apps in combination with physical devices [21,61,80,81] to find security and privacy issues.For example, Zhou et al. [100] studied the interactions between IoT devices, cloud, and apps using state machines and found issues that can lead to device hijacking.However, they require the IoT devices, which prevents scalability.We overcome this limitation with our new static analysis approach design.Wang et al. [94] analyzed companion apps without the corresponding device.Instead of analyzing and determining how and with whom the apps communicate, as we do, they focused on identifying re-branding and propagation of known vulnerabilities.That is, they require prior domain knowledge about other devices and existing vulnerabilities.Similarly, Jin et al. [50] aimed to identify companion apps at scale, and then to identify known vulnerabilities in the apps, such as outdated library versions.Nan et al. [62] analyzed IoT apps with machine learning to detect code that handles IoT-related data, and then assessed whether the behavior was communicated to the user.Naturally, their statistical machine learning approach fundamentally differs from our static program analysis approach.Other work [85,86,99,102] aims to find BLE issues in mobile apps.IoTFlow is more general as we investigate communication beyond BLE, thus obtaining a better understanding of a greater part of the IoT ecosystem.
General-purpose App Analysis.Understanding general-purpose mobile apps has seen significant work.Some approaches use dynamic analysis to run apps in controlled environments to observe their (network) behavior and endpoints [25,57,58,75,79,82].As we observed, the provided inputs impact the analysis, which is an ongoing research challenge [19,22,43].Moreover, to adopt these approaches, one would need the actual IoT devices, making largescale analysis infeasible.Several approaches extract information about apps through static analysis techniques, like VSA or flow analysis, such as network endpoints, API keys, protocol commands, etc. Gadient et al. [39] extract URLs and JSON schemas to study HTTP(S) usage, private APIs, and code injection vulnerabilities.Extractocol [51] reconstructs HTTP requests based on data flow analysis for automated protocol analysis, but does not scale.Stringoid [77] simulates string concatenations, but cannot reconstruct URLs built in other ways, such as okhttp3.HttpUrl.Builder.Leakscope [101] reconstructs API keys in mobile apps.Zuo et al. [102] reconstructed BLE UUIDs to identify vulnerable implementations of its pairing process.Wen et al. [95] reconstructed Controller Area Network (CAN) bus commands.

CONCLUSION
We introduced IoTFlow, a new technique for the large-scale security and privacy analysis of IoT devices through their companion apps.With Value Set Analysis (VSA), we extract network endpoints and protocols, which enables us to characterize IoT device behavior for local app-to-device communication and remote communication with cloud backends without requiring the physical IoT device.By cleverly combining VSA with Data-flow Analysis (DFA), we trace data flows from IoT devices and sensitive Android methods to understand better what data companion apps share and how.Leveraging IoTFlow, we analyzed 9,889 companion apps and 947 general-purpose apps.We identified striking differences between the two types of apps and discovered various security and privacy problems in the IoT ecosystem, such as abandoned domains, hardcoded credentials, expired certificates, or use of broken encryption algorithms.Our approach shows clear promise for identifying security and privacy issues of IoT devices at scale and it could be used to generate privacy labels or verify claimed behavior automatically.

Figure 1 :
Figure 1: Overview of the IoT ecosystem and its command and control scenarios, including apps as intermediaries.

Figure 2 :
Figure 2: Overview of IoTFlow.We use VSA to reconstruct endpoints, cryptographic data, and ICC keys for the flow analysis.We use flow analysis to find data leaks, and connect request/response data with endpoints.With the ICC information of the VSA, we support data flows involving ICC.

} 20 } 2 :
Listing Simplified example code of an activity that receives BLE data and publishes it via MQTT.Arrows on the left show identification and reconstruction via VSA, marked .Arrows on the right show DFA.Connecting reconstructions are marked

4. 3 . 1
Direct Device Communication.First, we analyze the reconstructed values for indicators of direct communication with the devices, such as local IP addresses, broadcast, and multicast addresses, user-configurable addresses (i.e., endpoints from user input; marked

Table 1 :
Dataset and Performance Overview.We show for the VSA, Flow Analysis, and the total time (VSA+Flow Analysis), the average time (Avg.),median time (Med.), and standard deviation (Std.) per app in minutes [minutes:seconds].

Table 2 :
Number of Apps using Direct Device Communication.Indicators are hard-coded local network IP addresses (grouped if found in 30 or more apps), user-configurable addresses (fromUI.local),broadcast and multicast, or Bluetooth.

Table 3 :
Number of Apps with Reconstructed URL Protocol Schemes.Percentages are relative to the numbers of total apps with at least one scheme.For IoT-VER, we identified schemes for 871 local endpoints and 7,113 remote endpoints.For GP-2022, we identified schemes for 14 local endpoints and 898 remote endpoints.Protocols marked with a star (*) are based on IoTFlow identifying the corresponding libraries.

Table 4 :
Certificates and Pinning.The first rows show the number of apps in which we found pinning, certificates, and apps containing expired or self-signed certificates.The remaining rows show the corresponding certificates.The number of expired certificates at the time of download is a lower-bound for IoT-VER because it is not always known.
[2] IoTProfiler and before 2021 for apps by Neupane et al.Apps might be downloaded later, but this does not threaten validity as our numbers are a lower bound.For IoTSpotter apps, the download date is available as they were downloaded via AndroZoo[2].Overall, 12.71% certificates were expired when the apps were downloaded (in 4.79% apps).In May 2023, 822 (8.31%) apps contain 9,129 (29.18%) expired certificates.Intuitively, expired certificates point to poor security practices and can even prevent communication.

Table 5 :
[36]gorized Endpoints by IoTFlow for IoT-VER,GP-2022, andComparison between IoTFlow (IF) and Dynamic Analysis (DA).We report with the number of unique FQDN per dataset and shared between them, prefixed with # the number of apps with at least one domain per category, the average number of domains per app, and the standard deviation.Advertisers and Trackers.We classify the FQDNs to learn who receives data from the app and, via the app, from the devices.We use the domain lists by Ren et al.[82], which they compiled from various ad-blocking lists.Additionally, we use the Exodus tracker list[36].Table 5 summarizes our results.IoT-Verified.Overall, 2,959 (29.92%) apps include 487 unique advertisement FQDNs and 1,647 (16.65%) apps use 114 analyticrelated FQDNs.Although they belong to different categories, both domains behave similarly by collecting user information.We also reconstructed 410 FQDNs pointing to Content Distribution Networks (CDNs) in 1,165 (11.78%) apps.Additionally, we identified 84 social network FQDNs shared across 1,046 (10.58%) apps, with the respective standard deviation indicating that if apps use one, they often use more.The remaining 7,248 FQDNs in 4,917 (49.72%) apps do not fall in our categories and we label them Other.

Table 6 :
Geographic Location of Network Endpoints.The numbers show the amount of endpoints from each location and the ratio to the overall number of endpoints.ouranalysis at the FQDN level because the FQDN endpoint receives the data.This granularity is also important because FQDN and eTLD+1 locations can differ.For example, xiaomi.com is hosted in China, but ru.register.xmpush.xiaomi.com is hosted in Russia.We can make multiple observations comparing endpoint locations between companion apps and general-purpose apps.First, substantially fewer endpoints for companion apps are in the US (46.09%) than they are for general-purpose apps (79.69%).The difference (33.6 percentage points) stems almost exclusively from more Chinese endpoints (27.25% to 1.36%, 25.89 pp), with the remainder (7.71 pp) being nearly covered by other Asian countries for IoT-VER (8.44% compared to 1.56%, 6.88 pp).Other regions remain mainly stable. perform

Table 8 :
Encryption Algorithms.The number of apps that use the respective encryption algorithm and its relation to the number of apps with encryption algorithms (4,069 IoT-VER, and 812 in GP-2022).Recommended algorithms are marked .Algorithms considered insecure or broken are marked ○.

Table 9 :
Tested Devices.The IoT devices that we tested dynamically together with their device type and package name.