Is a Trustmark and QR Code Enough? The Effect of IoT Security and Privacy Label Information Complexity on Consumer Comprehension and Behavior

The U.S. Government is developing a package label to help consumers access reliable security and privacy information about Internet of Things (IoT) devices when making purchase decisions. The label will include the U.S. Cyber Trust Mark, a QR code to scan for more details, and potentially additional information. To examine how label information complexity and educational interventions affect comprehension of security and privacy attributes and label QR code use, we conducted an online survey with 518 IoT purchasers. We examined participants’ comprehension and preferences for three labels of varying complexities, with and without an educational intervention. Participants favored and correctly utilized the two higher-complexity labels, showing a special interest in the privacy-relevant content. Furthermore, while the educational intervention improved understanding of the QR code’s purpose, it had a modest effect on QR scanning behavior. We highlight clear design and policy directions for creating and deploying IoT security and privacy labels.


INTRODUCTION
Security and privacy vulnerabilities of Internet of Things (IoT) products have long been exploited, resulting in leakage of personal information and eavesdropping of communication between devices [2,28], attackers taking over control of smart devices remotely leading to physical risks [11], unauthorized sensing and data collection, and use of personal information [4,6,37,42].Users have expressed concerns over security risks and privacy-invasive data practices of IoT devices [19,29] but find it difficult to act on these concerns, thereby putting themselves at risk [12].Proposals for security and privacy labels affixed to the IoT packaging show promise as an effective means of educating users about IoT security and privacy practices and promoting more informed device purchase decisions [19].
The United States government has been working towards the establishment of an IoT security and privacy labeling program for several years.In 2021, Executive Order 14028 tasked the National Institute of Standards and Technology (NIST) with developing a cybersecurity baseline for IoT products [25].NIST subsequently issued a white paper that included basic criteria for labeling [47].In October 2022, the White House convened representatives from the U.S. government, industry, and academia to discuss ideas for a national cybersecurity labeling program for IoT devices [26].In July 2023, the White House announced a voluntary IoT cybersecurity labeling program and unveiled a "U.S. Cyber Trust Mark" that would certify the fulfillment of basic cybersecurity criteria [27].A month later, the Federal Communications Commission (FCC) issued a Notice of Proposed Rulemaking (NPRM) soliciting public comments on a framework for a layered binary label including the U.S. Cyber Trust Mark and a QR code that can be scanned for information about specific IoT devices [8].
Previous research has focused on the design of IoT labels based on input from experts and iterative consumer testing [15,19].Emami-Naeini et al. proposed and evaluated a two-layer label design comprising a primary layer with the most salient security and privacy attributes for consumers and a QR code at the bottom leading to a more comprehensive secondary layer designed for experts [14,[16][17][18].However, IoT manufacturers have advocated for minimal labels that include only a Cyber Trust Mark and QR code, citing limited space on product packaging.Prior work has neither compared consumer preferences for minimal versus more expansive labels nor evaluated the comparative effectiveness of these approaches.
Our research aims to shed light on consumer preferences for, and the effectiveness of, three designs of varying complexity for IoT security and privacy labels on product packaging.Specifically, we investigate the following research questions: • RQ1: What is the impact of complexity level on consumers' understanding of the information on the labels?• RQ2: What is the impact of complexity level on consumers' interactions with labels (a) during the study and (b) their self-reported expected future interactions with labels?• RQ3: What is the impact of complexity level on consumers' preferences for the labels?• RQ4: Which label attributes do consumers report would most influence their decisions to purchase IoT devices?• RQ5: What is the impact of (a) a brief educational intervention, (b) age, (c) gender, and (d) technical background on consumer understanding, interactions, and preferences for the three labels studied?
We conducted an online survey of 518 purchasers of IoT devices to examine their preferences about the complexity of the labels on device packaging, their ability to use these on-package labels, as well as labels accessed through a QR code.We created a high-complexity label and an ultra-high-complexity label based on Emami-Naeini et al. 's label designs and the U.S. Cyber Trust Mark.We created a low-complexity label that included only a QR code and the U.S. Cyber Trust Mark.We also created a medium-complexity label that added a few of the most important elements from the high-complexity label to the minimal low-complexity design.
We assigned participants randomly to the low-, medium-, or high-complexity level and showed them labels for three functionally identical IoT devices (smart thermostats) with differing security and privacy properties under fictitious brand names.In addition, we randomly assigned half of the participants in each complexity group to view a brief educational intervention introducing the Trust Mark and QR code prior to beginning the survey.We asked participants questions to assess their comprehension of the labels and their ability to use them to compare products.As shown in Figure 1, participants who chose to scan the QR code on a label in the survey were redirected to a website displaying a higher complexity label.Once on the website, participants could interact with the label to obtain further information until they reached the ultra-high-complexity label.At the end of the survey, we showed participants labels from all four complexity levels and asked them about their most preferred label option.
We found that most participants did not scan the QR code, even when asked to answer questions based on seeing only the lowcomplexity label containing nothing but the QR code and the Trust Mark.Participants generally favored labels with more information, and preferred to have that information readily available on the package itself rather than only accessible by scanning a QR code.Less than 2% of participants preferred the low-complexity label when given a choice between labels.Our results also indicate that those who received a brief educational intervention at the beginning of the survey had a better understanding of the Cyber Trust Mark and the QR code.Despite this, the effect of education on motivating them to scan the QR code was limited.Participants were most interested in seeing labels that included information about the devices' sensors, data collection and purposes, data sharing practices, and information about security updates.
We recommend that as policymakers define requirements for products to receive the U.S. Cyber Trust Mark and consider designs for accompanying IoT labels, they focus on label designs similar to our medium-and high-complexity labels.These designs should provide both security and privacy information important for consumer decision-making on product packaging and use QR codes to provide more detailed information.As designs are refined, they should be informed with further consumer testing.Finally, we recommend sustained educational campaigns and in-store signage to inform consumers about the U.S. Trust Mark and how to use the accompanying label to make informed purchase decisions.

BACKGROUND AND RELATED WORK
We first discuss security threats to IoT devices.We then introduce the concept of labels and discuss prior work on the design and evaluation of labels in privacy and security contexts, particularly IoT devices.We end with a brief review of recent government and industry efforts to standardize IoT security and privacy labels.
IoT security and privacy threats.IoT device owners are susceptible to security breaches that may result in unintended exposure of personal or private information, including daily activities monitored by device sensors [1,58,59].Numerous cybersecurity attacks targeting IoT devices have been recorded, such as the Mirai botnet incident in 2016, when a worm-like family of malware named Mirai launched massive distributed denial-of-service (DDoS) attacks, resulting in 600k infections at its peak, and brought down many popular websites [1,30].Mirai has since inspired more advanced IoT botnets [24].Despite the security and privacy risks associated with IoT devices, details about device security and privacy are generally not available to consumers.As a result, consumers often purchase IoT devices without knowing about the potential privacy and security risks associated with them or which devices include features that may help mitigate risks [29,44].Recent studies have shown that consumers want information about security and privacy when making smart device purchases but lack a reliable means to access this information [19,22,60].
IoT labels.Labels communicate standardized information about consumer goods in a concise and organized manner.For example, nutrition facts and drug facts labels have helped consumers make informed purchasing decisions about food and pharmaceuticals.Past research has demonstrated that privacy labels on websites are more effective at aiding comprehension and enabling easy access to information than traditional text-based privacy policies [31,32,35].Kelley et al. developed an Android app privacy label and found that study participants who were presented with labels in the app store often chose more privacy-protective apps than those not shown the labels [33].More recently, privacy labels have been introduced in both iOS and Android app stores.However, studies have found their terminology and layouts can confuse both consumers and app developers [9,39,40,61], suggesting extensive user testing is needed in the development of new labels.
Railean and Reinhardt developed and evaluated a "Privacy facts" label for the European context that included information about an IoT device's sensors, data collection, data recipients, data processing purposes, retention periods, and data flows.Their label also included a QR code leading to actual data samples [50,51].
Emami-Naeini et al. interviewed U.S. consumers about their needs for both security and privacy information when purchasing IoT devices [19].They also interviewed IoT security and privacy experts about the information most important for making an informed purchase decision [15].Using what they learned from experts and consumers, they developed a two-layer IoT label for the U.S. context and showed that it is both usable and informative for consumers [16].In their CMU IoT Security and Privacy Label (CISPL) design, the primary layer contains the information that is most salient to consumers and a QR code leading to a more detailed secondary layer that adds additional information of interest to experts.In subsequent work, the authors demonstrated consumers accurately differentiated between more and less risky attributes included on the labels and that label information impacted their willingness to purchase IoT devices [17].In their most recent work, Emami-Naeini et al. demonstrated that consumers would be willing to pay a significant premium for more secure and private IoT devices, as compared to devices with bad practices or those without any disclosures, if security and privacy information was disclosed and made readily available [18].
Label regulation and policy.Governments worldwide have taken steps to promote, enforce, and standardize the use of IoT privacy labels [16,19].The CISPL specification provides a list of device security and privacy attributes and associated global standards [57].Finland and Singapore have recently developed IoT label standards [46,55].Singapore uses a tiered rating system (1 to 4 stars) based on requirements set forth by ETSI 303 645, which is the European security standard for IoT devices [10].The 1-star rating requires meeting the baseline ETSI standard while the 3-or 4-star ratings are given for independent verification and binary analysis by test labs and penetration testing by separate third parties respectively [45].Germany and Finland have a reciprocal arrangement for their own IoT schemes which recognize devices that meet the Singapore standard, and vice versa [46].Recently, the European Union passed the Cyber Resilience Act [56] that addresses cybersecurity of connected devices.The expectation is that additional security requirements will be added to existing requirements that manufacturers have to meet to get the European "CE Mark, " which signifies that a product meets various safety, health, and environmental requirements.It is not clear whether the E.U. will require adding a QR code or include any other information besides the CE Mark.
In response to the U.S. White House Executive Order 14028 in 2021, the National Institute of Standards and Technology (NIST) developed criteria for an IoT products labeling program [25,43,47].Subsequently, the U.S. Federal Communications Commission (FCC) unveiled the Cybersecurity Labeling Program for smart devices, including the U.S. Cyber Trust Mark.The Mark is intended to help Americans make informed choices about smart devices by indicating which devices meet a set of baseline criteria and providing additional security and privacy details via a QR code on product packaging.In August 2023, the FCC solicited input on the details of the label that will accompany the Mark on product packaging as well as the more detailed label accessible through the QR code [7,27].
The Consumer Technology Association (CTA) has convened working groups in an effort to reach a consensus on details of the U.S. Cyber Trust Mark program and has announced plans to submit comments to the FCC [3].The authors have observed that some IoT device manufacturers and retailers who are participating in these working groups have expressed concerns about space constraints when placing labels on physical product packaging and are advocating for compact package labels that include only the Cyber Trust Mark and a QR code.This research contributes to the discussion by providing empirical data on consumer preferences for labels of varying complexity as well as the impact of label complexity on consumer comprehension and behavior.

METHODS
In this section, we detail our pilot studies, participant recruitment, label design, survey protocol, and data analysis process.
Ethical considerations.Our study protocols were reviewed and approved by the Carnegie Mellon University Institutional Review Board (IRB).All study participants provided their consent using online forms approved by our IRB.As the study was conducted using the Prolific platform, we collected participants' Prolific IDs to facilitate payment.We collected no other personally identifiable information from participants, and we do not know the real-world identities associated with Prolific IDs.

Pilot Studies
We conducted pilot studies in the Spring and early Summer of 2023 that helped us iteratively refine our study protocol and label designs.These studies employed protocols fairly similar to the one used in our final study, described below.In addition to differences in label content and design, question format, and purchasing scenarios, our preliminary studies did not include functional QR codes or the U.S. Cyber Trust Mark (announced after these pilot studies).

Participant Recruitment
We conducted an online study of U.S.-based IoT device purchasers recruited on Prolific.To achieve a more representative sample through stratified sampling, we utilized Prolific's gender-balanced distribution feature and recruited a similar number of participants from three different age groups (18-35, 36-53, 54+), roughly proportionate to the U.S. age and gender distribution. 1Using Prolific's built-in prescreening tools, the posts were shown only to self-reported IoT device owners of a predetermined list of qualifying devices (see Appendix C).All participants were then redirected to the same prescreen survey.Participants who claimed to have purchased at least one IoT device in the past three years were then given a link to the main survey.Participants received $0.50 (median of $9/hour) as compensation for completing the prescreening survey and an additional $5 (median of $25/hour) for the main survey.

Label Design
We tested three IoT security and privacy label designs, which we refer to as low-, medium-, and high-complexity labels (see Figures 2, 3, and 4).Each label included the U.S. Cyber Trust Mark and a QR code that users could scan to retrieve a more detailed label.Users who scanned the low-complexity and medium-complexity labels were shown the high-complexity label, and users who scanned the high-complexity label were shown an even more detailed label, which we refer to as the "ultra-high-complexity label" (see Figure 5).At the end of the study, participants were shown labels of all four complexity levels and asked which they would prefer to see on product packages.
Our label designs were based on the primary and secondary labels proposed by Emami-Naeini et al. [14,15].We used the primary layer of CISPL as our high-complexity label, which linked  to our ultra-high complexity label based on the CISPL secondary layer when the QR code was scanned.After receiving feedback from pilot studies that the QR code on the high-complexity label was difficult to scan, we made some small alterations to the label layout to increase the size of the QR code and the quiet zone around it.Additionally, we added a button next to the QR code on the high-complexity labels displayed after scanning so that participants could click to conveniently retrieve the ultra-high complexity label (see Appendix E).
To develop the medium-complexity label, we focused on four attributes that Emami-Naeini et al. found to be strongly associated with increasing consumers' willingness to purchase, including two security attributes (security updates and access control) and two privacy attributes (data collected and data shared) [17].We also included symbols indicating the presence or absence of cameras or microphones in response to proposed U.S. legislation requiring internet-connected devices to disclose camera or audio recording capabilities [53].Based on feedback gathered from pilot surveys, we iteratively enhanced the medium-complexity label, which serves as a middle ground between the comprehensive high-complexity label and the minimal low-complexity label.
We designed the low-complexity label to show only the U.S. Cyber Trust Mark and a QR code, which, if scanned, would lead to more detailed information.It was formatted exactly the same as

Survey Design
We conducted a between-subjects survey in which one-third of the participants were randomly assigned to each label complexity level.For each of these three groups, we provided a brief educational intervention to half of the participants.The intervention (shown in Appendix B) included an image of the U.S. Cyber Trust Mark, a brief explanation of its significance, and a note that consumers can "scan the accompanying QR code to get more information about the product's security and privacy attributes." At the bottom of the intervention page were two questions testing participants' comprehension of the purpose of the U.S. Cyber Trust Mark and QR code.We implemented the survey such that only respondents  who answered both questions accurately could proceed to the next section.Participants were permitted to change their answers until they were able to answer both questions correctly.
Participants were presented with labels using their assigned complexity label for three fictional IoT thermostats with identical functionality but varying security and privacy attributes.As shown in Figure 6 for the medium-complexity group and Figure 7 for the high-complexity group, the three IoT thermostats included a device with strong privacy and security features, a device with medium privacy and security, and a device with weak privacy and security.We tried to select strong and weak values that could be clearly distinguishable by non-experts (e.g., more sharing implies weaker privacy than less sharing, and consent-based security updates are stronger than no security updates).To prevent external factors from influencing participants' decisions, we chose fictitious brand names that were distinct from existing brands.
We generated a unique QR code for each participant with QR-Code.js[52], overlaid on top of the labels through Qualtrics, directing those who scanned to a label hosted on our research group's web server.This enabled us to track participant scanning through the unique URLs that appeared in our weblogs.
The survey included multiple-choice questions, Likert scale questions, and open-ended questions to quantitatively and qualitatively  assess participant comprehension of label information, perception of the usefulness of label information, and ease or difficulty using the labels and QR codes.Near the end of the survey, we presented participants with the low-, medium-, high-, and ultra-highcomplexity labels and asked them which one they preferred to see on the product packaging and upon scanning a QR code.Finally, we asked them to rate the importance of various factors when purchasing an IoT device (Q31) and to indicate their agreement with four statements about their security and privacy behavior (Q32).As we wanted to ask only a few questions and have coverage of both security-and privacy-related behaviors, we did not use an established scale [13] but instead included four questions to cover the tendency to read privacy policies, motivation to keep accounts safe (from SA-6) [20], cookie blocking, and use of two-factor authentication.We provide all of our survey questions in Appendix A.

Data Analysis
We performed a quantitative analysis to look for significant differences between our treatment conditions (label complexity, educational intervention) as well as across demographic groups (age, gender, and technical background).For independent variables with more than two categories (age, label complexity), we adopted twostage testing: an overall omnibus, followed by pairwise tests if significant.Independent variables with two categories (education and gender) were tested directly with pairwise tests.For questions (i.e., dependent variables) with multiple-choice responses, we used Fisher's Exact test if more than 20% of the entries in a contingency table have less than or equal to 5 observations [21,34].For the remaining multiple-choice questions, which satisfy the Chi-square assumption, we performed Pearson's Chi-squared test [48].
For multi-select questions, we interpreted each of the possible options as a binary multiple-choice question with responses being True or False.We then tested each sub-question for significance using the same procedure as multiple-choice questions with two options.
For Likert-scale questions or numeric-response dependent variables (e.g., number of QR code scans), we measured rank significance across complexity groups and demographic groups using the Kruskal-Wallis omnibus test [36].If significance has been identified for a specific question across all tested groups, we then performed the Mann-Whitney  test on each pair of groups to determine pairwise significance.
We performed a post hoc Benjamini-Hochberg procedure to all -values globally, in order to control for false discovery rates (FDR) potentially caused by multiple testing [5].
We conducted a qualitative analysis of open-ended responses based on a codebook developed jointly by three authors of this paper.During the formative and pilot studies, two or three authors coded every response while maintaining a high agreement rate.For the main study, two authors independently coded all open-ended questions, agreeing on the codebook and relevant assumptions.After completing the coding process, the two coders reconvened to review all responses and reached a consensus on the codes for every response.We compute the Kupper-Hafner concordance (a form of inter-rater reliability for when units, i.e., responses, are coded with multiple codes) of the two independently coded sets for a total of 10 codebooks, and obtain a maximum, minimum, and average agreement of 0.76, 0.58, and 0.68, respectively, which indicates substantial agreement [23,38,41].All IRR numbers are provided in Appendix D.

Limitations
We recruited participants using the Prolific crowdsourcing platform.While such platforms are popular in research studies, including other studies that solicited consumers' security and privacy perceptions for IoT devices [15,19], they are not completely representative of the general public.In addition, in our study our participants are taking a survey and not physically visiting stores to purchase IoT products with labels.Thus, their observed behavior may not exactly match what they would do in real life, and their self-reported expected behavior may reflect biases reflective of being a study participant.Furthermore, purchase decisions in real life are likely influenced by other factors such as brand recognition, price, and functionality features.We have attempted to carefully control for these confounding factors by designing a relatively realistic scenario but using fictitious products and reminding our participants that, besides any differences illustrated on the labels, all other functionality-related features of the devices whose labels are shown are identical.
In our study, we recruited a gender-balanced and age-balanced (in three age buckets) set of participants from the U.S. only.We believe this is appropriate as we were testing the U.S. Cyber Trust Mark and accompanying label specifically.Thus, our findings may not generalize to other IoT cybersecurity marks or labels such as those from Singapore or the E.U. [50,51].

RESULTS
First, we present a summary of our participant demographics.Then, we present our results on the impact of complexity level on participants' understanding of the labels (RQ1), followed by how participants used the labels and QR codes during the study and how they would expect to use them if they encountered them on products (RQ2).Next, we present our results related to consumer preferences and attributes that would influence consumer decisions (RQ3 and RQ4).Finally, we discuss the impact of our educational intervention, age, gender, and technical literacy on consumer understanding, interactions, and preferences of labels (RQ5).

Participants
559 participants completed the survey and received compensation, with a median completion time of 12 minutes and 1 second.We filtered out responses from 41 participants according to criteria we established prior to survey distribution.As we required participants to have purchased an IoT device in the past three years, we removed 36 participants who had not done so based on their response to an open-ended pre-screen question that asked them to list the IoT devices they had purchased over the last three years (many of these participants mentioned purchasing only phones, tablets, computers, or other non-IoT devices).We removed a total of four participants based on the detection of survey straightlining, including one participant who responded with the same Likertscale rating for all but one of the Likert-scale questions.The other three of the four participants removed for suspected straightlining responded to over 85% of Likert-scale questions with the same rating but responded to other questions in a way that clearly contradicted opinions expressed through their Likert ratings.Finally, we removed one participant who provided nonsensical responses to all three open-ended questions on the main survey.
Out of the 518 remaining participants, 176 were assigned to the low-complexity group, 172 to the medium-complexity group, and 170 to the high-complexity group.179 participants were between the ages of 18 and 35, 177 were between the ages of 36 and 53, and 162 were age 54 or older.30.9% of participants self-identified as having a technical background.Demographic information is shown in Table 1.To understand participants' interests in privacy and security, we asked them four questions about their security and privacy behaviors.As shown in Figure 8, most participants reported taking steps to keep their data and accounts safe, block cookies, and use two-factor authentication.However, a large percentage of participants reported that they do not typically read privacy policies.

Understanding the Labels (RQ1)
To measure how well participants would understand and use labels, we created tasks involving label use.We first displayed three labels of the same complexity to participants, instructing them to imagine these labels were on physical product packages.Each label depicted a functionally similar smart thermostat with different security and privacy attributes (shown in Figure 6 and Figure 7), enabling controlled comparison.We asked participants which product they would be most likely to purchase after viewing the labels, followed by questions about specific information contained in the labels and overall comprehensibility/usefulness questions.
As the products were depicted as being functionally identical except for the security and privacy attributes and any differences shown on the label, we expected that participants would be most likely to select the product with the best security and privacy attributes if they had reviewed and understood the information on the medium-, high-, or ultra-high-complexity labels.When asked about which device participants would purchase, we found statistically significant differences in which option participants would select between all label groups ( < 0.001 between low-complexity  and medium-complexity groups and between low-complexity and high-complexity groups;  = 0.01 between medium-complexity and high-complexity groups).As shown in Figure 9, 55.3% and 62.8% of participants in the high-complexity and medium-complexity group respectively selected Sustios, which had the best privacy and security features.68.3% of participants in the low-complexity group said they did not have sufficient information to make the decision.While participants in the low-complexity group were not shown sufficient information on the label, they could retrieve more information through the QR code, but we observed that 67% did not scan the QR codes.

None of the devices provide automatic security updates
There isn't enough information on the labels to determine this I don't understand the information on the label Participants were then asked to use the labels to identify which product had a particular security or privacy attribute.As shown in Figure 10, we found that 98.3% and 99.4% of medium-and highcomplexity participants, respectively, were able to correctly identify the device that uses a camera or other visual sensor (Q5) compared to 24.3% of low-complexity participants ( = 0.003 between low and medium, and  = 0.003 between low and high).Similarly, as shown in Figure 12, 95.9% and 79.4% of medium-and high-complexity participants, respectively, were able to correctly select the device that provided consent-based-security updates (Q8) compared to 21.6% of low-complexity participants ( = 0.003 between low and medium, and  = 0.003 between low and high).In this case, the medium-complexity group achieved a significantly higher accuracy rate compared to the high-complexity group ( = 0.003).For these questions, the medium-and high-complexity package labels included the information needed to find the correct answer, while the low-complexity groups had to scan the QR codes to find the correct answer. 2n line with low-complexity results, the medium-complexity group's performance declined significantly if the security and privacy attribute in question was not shown on the medium-complexity packaging label and had to be accessed via the QR code.As shown We tested all attributes for differences between groups who did and did not receive an educational intervention and found significant differences only for the Cyber Trust Mark.
in Figure 11, less than 5% from medium-complexity correctly answered our question about who data is shared with (Q6), compared to nearly 20.6% for low-complexity ( = 0.003) and more than 77.4% for high-complexity ( = 0.003).
All of the label attributes other than the Cyber Trust Mark had been tested in prior user studies and found to be fairly well understood [15,17].We asked participants to self-report how well they understood each label attribute that appeared on the packaging label for their condition(Q18).This allowed us to confirm that our participants also felt they understood the attributes and to compare the understanding of the Cyber Trust Mark to the understanding of other attributes.In Figure 13, with the exception of the Cyber Trust Mark, we ranked these attributes from best to least understood.We can see that while most participants said they understood the QR code and other security and privacy attributes, fewer said they understood the Cyber Trust Mark.As will be discussed further in Section 4.5, those exposed to the educational intervention had a significantly different understanding of the Cyber Trust Mark but not the other attributes.

Consumer Behavior and Intentions (RQ2)
We used our shopping scenario and product comparison tasks to create a controlled but relatively realistic scenario to observe how Is a Trustmark and QR Code Enough?
Chen, et al. participants would likely interact with IoT package labels in the wild.We extracted data from our web server logs to observe when participants interacted with labels through QR codes (RQ2a) and asked survey questions to gain an understanding of the reasons behind participants' behavior and their self-reports of how they would likely use such labels in the future (RQ2b).
First, we asked participants whether they would consider the labels if they were shopping for a product and saw them on the packaging (Q1).As shown in Figure 14, a higher percentage of participants from the medium-and high-complexity groups responded that they would examine the information presented, including looking for anything particularly concerning ( < 0.001 between low and medium, and low and high), carefully comparing the labels ( < 0.001 between low and medium, and  = 0.002 between low and high), and thoroughly examining the labels ( < 0.001 between low and medium, and  = 0.007 between low and high).Participants in the low-complexity condition were most likely to say they would scan the QR code ( = 0.003 between low and medium, and  = 0.001 between low and high).For most of these options, no significant difference is found between the mediumand high-complexity groups.
Using web server log data, we calculated the percentage of participants within each complexity group that scanned different numbers of QR codes they were shown.As shown in Figure 15, 33.0% of participants in the low-complexity group scanned the QR code on at least one label, while the figure drops significantly for the medium-and high-complexity groups to 4.7% and 12.4% respectively ( = 0.003 between low-and medium-complexity,  = 0.003 between low and high, and not significant between medium and high).Among participants who scanned more than three QR codes, 13 of them were  group and groups who did and did not receive educational intervention.There were statistically significant differences between the education and no-education groups as well as between the low-complexity group with educational intervention and the low-complexity group without educational intervention.
from the low-complexity group and scanned an average of 6.8 times.These participants scanned the same QR codes more than once as they went back and forth between labels, looking for information to answer questions that those in the low-complexity group could access only through the QR codes.Likely, they did not know how to return to the previously scanned labels using the browser on their phones.Only one participant from each of the medium-and high-complexity groups scanned more than three times.We also found significant differences in the scanning behavior between the education and the no-education group, as discussed in Section 4.5.We asked the participants who self-reported in the survey that they did not scan the QR code to identify the primary reason for not doing so (Q12).Our results, shown in Figure 16, illustrate that the most common reason was the time burden (32.7%), followed by not being interested in the information (26.4%).A large number of participants (22.8%) were also worried that scanning QR codes could be insecure, which is a realistic threat [49].
We asked participants how likely they would be to scan a QR code for more information if they saw a label when actually shopping for a device (Q10).Across all conditions, 44.4% of participants said they were likely or very likely to scan the QR code.There were no significant differences between conditions.
We asked our participants what they would likely do to get more information if they were in a store and saw the label of their assigned complexity level with a QR code on the packaging (Q11).Across all groups, about half said they would scan the QR code (49.6%).Others said they would search online (35.1%) or visit the manufacturer's websites (7.1%).There were no significant differences found between complexity groups.

Consumer Preferences (RQ3 and RQ4)
We asked participants a series of questions about their opinions about the specific label they viewed.Additionally, if the participant scanned the QR code, we asked them a series of questions related to the retrieved label.We included open-ended questions that elicit participants' opinions on labels they saw and ideas for potential improvements.
We first asked the participants which attributes on the label would most influence their decision to purchase the IoT device (Q20).The influence ratings, shown in Figure 17, highlight that participants found privacy attributes (such as whether the data was sold to third parties, shared, stored, or collected) as well as security attributes (including security updates and access control) to be influential to their purchase decision.Again, the only attribute to which education makes a difference is the Cyber Trust Mark, which we will further discuss in Section 4.5.
Similar to our results for understanding, the QR code and the Trust Mark were reported to be the two least influential elements.Note that we did not ask participants to explicitly compare a device with a Cyber Trust Mark (indicating compliance with baseline standards) to a device without one (indicating lack of compliance).Thus, we cannot assess the influence of the Trust Mark in consumers' decision-making between devices with and without a Trust Mark.
As shown in Figure 18, when we asked participants how helpful they found the information on the packaging label (Q22), 68.6% of participants from the medium-complexity group and 78.8% from the high-complexity group found the information presented somewhat or extremely helpful, while a significantly lower percentage (17.1%)from the low-complexity group found it helpful ( < 0.001 for all three pairwise tests).
To better understand participants' preference for label complexity, we asked them whether the package label they were shown  had enough information, too much information, or just about the right amount (Q24).As shown in Figure 19, we found that only 15.3% of the participants in the low-complexity group found the level of information just right, with 80.1% reporting the level of information is not enough.In contrast, for the medium-and highcomplexity labels, participants reported them being just right in terms of information presented 51.2% and 78.8% of the time, respectively, far exceeding ( = 0.003 between low and medium, and  high-complexity group also had a significantly higher percentage of participants who said the label shown to them contained the right amount of information ( = 0.003).Notably, only a small percentage of participants from any condition thought that there was too much information: 5.3% from the high-complexity group and less than 2% from low-and mediumcomplexity groups.
We also asked participants what additional information they would like to see on the labels in an open-ended question (Q27).In every condition, a large number of people remarked that they wanted the label to contain more information.63.6% of participants in the low-complexity condition remarked that they wanted more information, and some mentioned specifically wanting more actual information on the packaging label without having to scan a QR code.One commented, "The details should be directly on the label.No business should expect a customer to scan some random QR code."Another participant wrote, "Any information on security and privacy would help.This doesn't give much info."The only specific information that participants frequently requested was more information about the types of data shared (requested by 42.0% of participants from the medium-complexity and 16.0% of participants from the low-complexity groups who did not scan).One participant from the medium-complexity group stated, "I need more clarification about who it shares the information with directly on the label."Note that those in the medium-complexity group would have seen a list of the types of information shared, but with no details about the sharing, whereas those in the low-complexity group who did not scan would not have seen any mention of sharing.Participants from the high-complexity group were more likely to be satisfied with the amount of information on the label.One wrote, "There is already a lot of information on the label, I wouldn't want to add anymore because it'd feel like too much info." Since the labels include a QR code, we followed up with another question where we asked participants to rate the level of information they saw after scanning the QR code (Q25).This question was only asked of participants who indicated that they had scanned the QR code.Across all three conditions, an overwhelming majority reported that the secondary layer label that was shown had the right amount of information (72.9%, 81.8%, and 76.9% of low-, medium-, and high-complexity group participants with no statistically significant difference between conditions).Nonetheless, a small percentage of participants still reported not having enough information, and a few (< 8%) thought the secondary layer contained too much information.After responding to all questions related to their assigned label, participants were shown four labels: the low-, medium-, and highcomplexity labels, as well as the ultra-high-complexity label shown to participants who scanned the QR code or clicked the button on the high-complexity label.Participants were then asked to select the label they would most like to see on product packaging (Q28).As shown in Figure 20, participants overwhelmingly did not want to see the low-complexity label: only 6 out of 518 respondents selected it as their top choice.The most popular option was the high-complexity label, which was selected by 42.1% of participants across all conditions, followed by the ultra-high-complexity label at 32.8%, with the medium-and low-complexity label accounting for the remaining 23.9% and 1.16% respectively.Interestingly, those in the medium-complexity group were less interested in seeing the medium-complexity label than those in the other two groups ( = 0.003 between low and medium, and  = 0.031 between medium and high), perhaps because they had experienced using it and were more aware of its limitations.
We followed up with a question asking participants to explain the reasons behind their label choice (Q29).46.4% of participants who chose the medium-complexity label and 53.7% of participants who chose the high-complexity label mentioned the amount of information as one of their reasons.Many participants considered the medium-complexity label a good balance between too little and too much information.One participant who chose the mediumcomplexity label stated, "I feel like it has a good amount of basic information to go off of.If I needed more, I would look it up online.The [low] felt like it had almost no information, and [high] and [ultra-high] felt like information overload."Participants who selected the high-complexity label shared similar views while also complimenting the high-complexity label for presenting up-front information without having to scan the QR code.One participant added "it's a lot faster for me to read the label that is already there, as opposed to scanning a QR code.Also I am a little wary of scanning random QR codes unless I already know that I can trust the source, as I have heard about malicious QR codes."More than 80% of the participants who chose the ultra-high-complexity label said they preferred the label because it contained a lot of information.One of these participants added, "It has the most detailed information.I almost picked label [high] but label [ultra-high] had some of the information that I was looking for that label [high] did not have."  We asked participants for potential improvements to the label that was shown to them throughout the survey.For the lowcomplexity group, about a third of participants asked about what the Cyber Trust Mark means and wanted a clearer explanation regarding what "more info" entails, such that they know what they are expecting to see after scanning the QR code.24.0% of participants who saw the low-complexity label indicated wanting the label to contain at least some basic information without having to scan the QR code.According to one participant, "I want more information on the label itself.Many people do not know how to use QR codes, or do not have the technology or experience to use it." Participants from the medium-complexity group were specifically interested in knowing more about shared or sold data, with nearly 30% of them explicitly asking for that information to be presented on the package label.In addition, 17.4% of mediumcomplexity participants asked for more security/privacy-related information, and 19.8% asked for more information generally.For participants from the high-complexity group, fewer mentioned wanting more information of any kind, and 8.8% of them said they wanted to reduce the amount of content on the label.One of them stated, "it feels very busy.I don't know where to look, It should be like amazon or youtube where you know where the information is on the page.Perhaps remove sensor type.We know microphones capture sound.Prioritize shared with, sold to, data stored on cloud.Put everything else in the qr code."Across all conditions, participants had minor design suggestions related to fonts, color, layout, and other design features.
Next, we presented participants with the same four label layers and asked which they would like to see after scanning the QR Code (Q30).The ultra-high-complexity label was selected by 48.8% of participants across all groups, followed by the high-complexity label selected by 35.1% of all participants.The responses had no significant differences between complexity groups.

U.S. Cyber Trust Mark Education (RQ5a)
Existing labeling programs, such as the Energy Star label for energy efficiency, were supported with extensive education campaigns to help consumers know what to look for when purchasing appliances [54].As the U.S. Cyber Trust Mark is not yet available on packages, consumers are not yet familiar with its purpose and meaning.We developed a simple educational intervention (shown in Figure 22) to explain the purpose of the U.S. Cyber Trust Mark and QR code and showed it to half our participants across all label groups.We did not allow them to proceed further in the study until they correctly answered questions to confirm they had a basic understanding of the Trust Mark and QR code.We asked these questions again to all participants later in the survey and compared the accuracy rate of participants who received the educational intervention with those who did not.Further, we examined whether the educational intervention impacted whether participants scanned the QR codes, their expectation of what they would see if they scanned the QR codes, their self-reported understanding of the Trust Mark, and their self-reported assessment of the Trust Mark's influence on their purchase decisions.
After answering three survey questions, all participants, regardless of educational interventions, were shown the question about the purpose of the Cyber Trust Mark (Q4).Participants in the education group significantly outperformed those in the no-education group across all three label complexities.As shown in Figure 21, 84.8% of all participants in the education group selected the correct option, which was that the presence of the mark meant that the device met baseline security and privacy requirements, as compared to only 16.5% in the no-education group ( = 0.003).Over half the participants in the no-education group incorrectly believed that the mark indicated that the device had been tested and certified by an independent organization or the government.
We used web server log data to analyze whether education affected the number of QR codes participants scanned (shown in Is a Trustmark and QR Code Enough?  Figure 15).We found that across all participants, those in the education group were significantly more likely to scan the QR code more times ( = 0.012) than those in the no-education group.Participants in the education group scanned an average of 0.813 times, while those in the no-education group scanned an average of 0.276 times, a nearly three-fold increase.Moreover, we examined the effect of education within each complexity group, finding a significant difference only for the low-complexity group ( = 0.003), 3 with 23.7% of education-group low-complexity participants scanning at least once compared to 11.1% of non-education-group low-complexity participants.
During the latter half of the survey, participants who did not scan the QR code were asked what information they expected to see after scanning the QR code (Q13).The results shown in Figure 23 indicate that more participants in the education group answered the question correctly 71.7% as compared to those who did not receive the education (28.6%,  = 0.003).We found no statistically significant relationships across complexities for both of these questions.
As discussed in previous sections, we asked participants to rate their level of understanding of each label attribute and the extent to which these attributes would influence their purchase decisions (Figures 13,17).We found participants who were exposed to education on the Cyber Trust Mark indicated having a better understanding of the Trust Mark ( < 0.001), with an average Likert rating of 3.82 on a 1-5 scale compared to 2.8 for the no-education group.Those in the education group also rated the Trust Mark as having more influence on their purchasing decision ( = 0.003).

Effects of Demographic Factors (RQ5b,
RQ5c, and RQ5d) In the prescreening survey, we asked participants to report whether they had any education or experience in engineering, computer science, or similar technical fields to evaluate the impact of technical backgrounds on survey responses.For the vast majority of the questions, we found no statistically significant difference between participants with or without a technical background in label comprehension, label preference, and scanning behavior.We found that the only significant differences to arise were that participants with a technical background reported a higher tendency to read privacy policies (an average of 3.24 on a scale of 1 to 5 compared to 2.78 for those without technical experience,  < 0.001) and a greater motivation to keep their online data and accounts safe (4.08 and 3.78 for participants with and without technical experience respectively,  = 0.012).These results suggest that our label designs work similarly regardless of technical background.
We tested whether age is a determining factor in label comprehension, label preference, and consumer behavior.We found that young people aged 18-35 were less willing to scan QR codes compared to the other two age groups.On a scale of 1 to 5, the mean Likert score of willingness to scan for 18-35 age group is 2.7, compared to 3.23 for 36-53 age group and 3.37 for 54+ age group ( = 0.003 between 18-35 and 36-53, and  < 0.001 between 18-35 and 54+ age groups).Participants aged 18-35 were also reported to be less likely to take privacy-related actions, including reading privacy policies (2.61 for participants aged 18-35 compared to 3.19 for those aged 36-53, and 2.96 for ages 54+,  < 0.001) and less motivated to take steps to ensure online privacy (3.64 for 18-35 compared to 4.09 for 36-53 and 3.9 for 54+,  < 0.001).When asked about which label they prefer, our results showed that young people differ significantly from other age groups ( = 0.005), preferring the medium-complexity label more than people in the 36-53 and the 54+ age groups (33.0%, 18.1%, and 20.4% of the groups, respectively), with a larger percentage of people in the latter two groups preferring the high-or ultra-high-complexity labels (66.5%, 79.1%, and 79.6% of the groups, respectively).In addition, we found several numerically small but statistically significant differences between age groups for some survey questions without clear trends in either direction.
Based on participants' responses in the prescreening survey, we divided participants into male and non-male groups and tested for the impact of gender differences concerning RQ1, RQ2, RQ3, and RQ4.We found no significant differences between the two groups.

DISCUSSION
Our results demonstrate that IoT purchasers are interested in learning more about the security and privacy of devices and they would like to see this information on product packaging.As detailed below, our participants had a strong preference for higher complexity labels and were almost unanimously unsatisfied with the lowcomplexity label.We found that our participants disliked accessing critical information by QR codes, and we observed that comparing labels on a phone screen is awkward.Without education, we found substantial confusion about the Trust Mark and QR code.Our simple educational intervention improved understanding of the QR code's purpose but had a relatively modest effect on QR code scanning.Finally, our results support the need for including privacy information along with security information on package labels.As the U.S. FCC defines requirements for using the U.S. Cyber Trust Mark and considers designs for accompanying IoT labels, the CISPL label (which we used for our high-complexity label) [57] or a simplified version, similar to the medium-complexity label we tested, presents a deployable baseline that can be refined as new requirements are articulated.However, as label designs evolve, further testing is critical to ensure labels meet consumer needs.
Strong preference for higher complexity.Study participants in all label conditions were overwhelmingly opposed to the lowcomplexity label that required scanning a QR code in order to obtain any security or privacy information.Indeed, only 6 out of 518 participants indicated they most preferred the low-complexity label.Most preferred to see the high-complexity label on product packaging, although some preferred the ultra-high-complexity label and some preferred the simpler medium-complexity label.In general, participants preferred more information, regardless of which label they were shown, educational intervention, their age, or whether they had a technical background.
The medium-and high-complexity labels performed similarly, although there are some tradeoffs between them.The high-complexity label was more often preferred by participants and contained more information that might be needed to compare products, but participants made fewer comparison errors using the medium-complexity label.Both types of labels might be offered as options for manufacturers to use depending on available space on product packaging.
Usability issues with QR codes.While more participants scanned the QR code in our low-complexity condition when they could not obtain any information otherwise, most did not scan and said they would be reluctant to do so in the future.Some mentioned the inconvenience of scanning, while others were concerned that QR codes might not be secure.We note that even if participants were to scan QR codes, comparing labels on a small phone screen is difficult and would likely require going back and forth between labels and re-scanning multiple times-something we observed several participants doing.Emami-Naeini et al. have proposed a comparison tool that could produce a compact table for consumers comparing a small number of devices against their preferred criteria [16].While such a tool would make it easier for consumers to compare labels on a phone, it would still be useful to have the information directly on the product packaging.Indeed, consumers are used to seeing food nutrition labels, light bulb energy labels, and other consumer labels on packages, allowing easy side-by-side comparison.
Education is Key.As it is infeasible to fit all possible privacy and security information on a physical label while also keeping it up to date, a QR code (or a URL) is required to link more comprehensive labels for users, regulators, or experts.However, as we've found, simple linkage isn't enough.Consumers will need to be educated about what the U.S. Trust Mark implies, and how scanning the QR code leads to more security and privacy information about products, ultimately leading to better-informed purchase decisions.Wording improvements on the label or the Trust Mark might improve clarity and serve to nudge people to scan label QR codes.Furthermore, in-store signage (e.g., on store shelves) next to IoT products might help educate consumers.Prior efforts such as the Energy Star and Energy Guide Labels were supported by multi-year educational campaigns to inform consumers.IoT labels are arguably more complex and might require even larger investments in education.Encouragingly, we show that our educational intervention improved the understanding of the Trust Mark and what consumers could find upon scanning the QR code.This understanding also translated to behavior; we saw a modest increase in the number of participants who scanned the QR codes, particularly in the low-complexity condition.However, to a large extent, most participants were still not motivated to scan the QR code, regardless of educational intervention.
Security and Privacy.Participants were interested in seeing a range of privacy and security attributes on the label, and seemed especially interested in privacy-related attributes that would inform them about what data would be collected, utilized, and shared.This is particularly important since the criteria that the NIST IR 8425 document [47] lists as the "baseline criteria" that may drive the requirements to get the Cyber Trust Mark are mostly securityfocused, with no explicit mention of privacy factors.Based on our study and similar findings in prior work [15,16], privacy factors such as which sensors devices have, whether data is sold, and how it will be used are critical to include on the package label itself.Furthermore, privacy information may be essential for the U.S. label to be recognized internationally, given that several countries are basing their own requirements around the ETSI 303645 standard [10] which explicitly discusses privacy factors.

CONCLUSION
We studied the effectiveness of high-, medium-, and low-complexity versions of an IoT security and privacy label designed for product packaging.Each version included the newly introduced U.S. Cyber Trust Mark and a QR code with the medium-and high-complexity versions including additional security and privacy information.We conducted a 518-participant online study in which participants were randomly assigned a label complexity level and asked to use the labels to compare three functionally similar smart thermostats.Half the participants received a brief educational intervention at the beginning of the study, informing them about the purpose of the U.S. Cyber Trust Mark and accompanying QR code.At the end of the study, participants were shown labels of all three complexity levels along with an ultra-high-complexity label.We investigated the impact of label complexity level on consumers' understanding of label information, interactions with labels, and preferences for the labels.In addition, we investigated which label attributes were most influential.Finally, we explored the impact of the brief educational intervention, age, gender, and technical background on understanding, interactions, and preferences.Our findings show that participants strongly favored the higher-complexity labels and were reluctant to scan the QR codes, regardless of age, gender, or technical background.They reported finding a range of privacy and security attributes influential.While our educational intervention improved understanding of the purpose of the Trust Mark and QR code, our results suggest that it had only a small impact on motivation to scan the QR code.

Figure 1 :
Figure 1: Possible participant interactions with the labels.
Is a Trustmark and QR Code Enough?Chen, et al.
(a) Medium-complexity label for Sustios (b) Medium-complexity label for All4home (c) Medium-complexity label layer for EcoHouse

Figure 6 :
Figure 6: Medium-complexity labels for the three smart thermostats.Sustios has the best privacy and security attributes, followed by All4home.EcoHouse has the worst security and privacy attributes.The low-complexity labels were formatted the same as the left side of the medium-complexity labels.
(a) High-complexity label for Sustios (b) High-complexity label for All4home (c) High-complexity label for EcoHouse

Figure 7 :
Figure 7: High-complexity labels for three smart thermostats.Sustios has the best privacy and security attributes, followed by All4home.EcoHouse has the worst security and privacy attributes.

Figure 8 :
Figure 8: Q32 -How well do you agree with each of the following statements?

Figure 10 :
Figure 10: Q5 -Which device uses a camera or other visual sensor?The correct answer is EcoHouse.

Figure 12 :
Figure 12: Q8 -Which device provides consent-based security updates?The correct answer is Sustios.

Figure 14 :
Figure14: Q1 -If you saw these three labels on their products' packaging, would you consider them as you shop?Which of the following actions would you take?Participants saw all three labels corresponding to their assigned complexity group.An asterisk ( * ) indicates a statistically significant difference between label complexity groups.Complete statistical results can be found in Appendix F.

Figure 15 :
Figure15: Number of QR codes scanned by label complexity group and groups who did and did not receive educational intervention.There were statistically significant differences between the education and no-education groups as well as between the low-complexity group with educational intervention and the low-complexity group without educational intervention.

Figure 17 :Figure 18 :
Figure17: Q20 -How much does each of the attributes influence your purchase decision?We tested all attributes for differences between educational groups and found significant differences only for the Cyber Trust Mark.

Figure 19 :
Figure 19: Q24 -What do you think about the amount of information on the labels you were shown above?

Figure 20 :
Figure 20: Q28 -When you are shopping for an IoT device, which of the four label designs above would you be most interested in seeing on the product packaging?Participants saw label 1, 2, 3, and 4 as options, which correspond to low-, medium-, high-, and ultra-high-complexity label.

Figure 21 :
Figure 21: Q4: -Which of the following do you think best describes what the presence of the Cyber Trust Mark on the label represents?The correct answer is "This IoT device passes minimum security and privacy requirements."

Figure 22 :
Figure 22: The brief educational intervention was randomly shown to half of the participants.

Figure 23 :
Figure 23: Q13 -Which of the following best describes what you would expect to find after scanning the QR code?Only those who did not scan were asked this question.The correct response is "More information about the device's privacy and security."

Table 1 :
Demographic distribution of participants across label complexity groups.
Which of the following best describes why you wouldn't be likely to scan the QR code?This question was only shown to participants who did not scan the QR code.

Table 2 :
The table contains the Kupper-Hafner agreement rate for qualitative coding.If you saw a label like this when actually shopping for an IoT device, how likely would you be to scan the QR code for more information?

Table 3 -
continued from the previous page When you are shopping for an IoT device, which of these label designs (if any) would you like to see after you scan the QR Code on the label on product packaging?(Note: you must select a different label design from what you selected above.)

Table 6 -
Continued from previous page When you are shopping for an IoT device, which of these label designs (if any) would you like to see after you scan the QR Code on the label on product packaging?(Note: you must select a different label design from what you selected above.)

Table 6 -
Continued from previous page