On the Feasibility of Predicting Users' Privacy Concerns using Contextual Labels and Personal Preferences

Predicting users’ privacy concerns is challenging due to privacy’s subjective and complex nature. Previous research demonstrated that generic attitudes, such as those captured by Westin’s Privacy Segmentation Index, are inadequate predictors of context-specific attitudes. We introduce ContextLabel, a method enabling practitioners to capture users’ privacy profiles across domains and predict their privacy concerns towards unseen data practices. ContextLabel’s key innovations are (1) using non-mutually exclusive labels to capture more nuances of data practices, and (2) capturing users’ privacy profiles by asking them to express privacy concerns to a few data practices. To explore the feasibility of ContextLabel, we asked 38 participants to express their thoughts in free text towards 13 distinct data practices across five days. Our mixed-methods analysis shows that a preliminary version of ContextLabel can predict users’ privacy concerns towards unseen data practices with an accuracy (73%) surpassing Privacy Segmentation Index (56%) and methods using categorical factors (59%).


INTRODUCTION
Predicting users' privacy concerns towards unseen data practices can signifcantly improve today's privacy ecosystem.First, businesses can use these predictions to understand whether their data collection and usage approaches are in accordance with individuals' privacy expectations [30,33].Second, developers and HCI researchers may leverage these predictions to empower users with personalized privacy management tools, enabling them to have default privacy settings tailored to their specifc attitudes [40,51].Third, policymakers can leverage these predictions to better understand the privacy protection needs of the population and design more efective privacy regulations [1,53].
However, modeling users' privacy concerns is challenging.Typical modeling processes can be decoupled into two common steps: capturing users' privacy profles, then predicting their privacy concerns using the captured profles [27,34,37,48].For example, Alan Westin developed Privacy Segmentation Indexes consisting of a few questions and a set of rules to group participants into three categories based on their responses (fundamentalists, pragmatists, and unconcerned) [27].However, many have found that Westin's categories have limited efectiveness in predicting people's privacy attitudes towards specifc data practices [4,17,27,35,52].
Recently, many studies have found that contextual factors within a data practice signifcantly impact users' privacy attitudes [20,42,51,52,54].Researchers have also started to use factorial vignette experiments to capture users' domain-specifc privacy profles [5,10,11,28,33,37,42,54], suggesting the impact of contextual factors on users' privacy concerns or privacy norms.For example, Emami-Naeini et al. analyzed how a few domain-specifc categorical factors (e.g., data types, purposes) afect users' comfort levels in the context of IoT data collection [37].However, these models and captured factors can hardly be generalized to data practices in a diferent domain, such as applying an IoT model to forecast user comfort levels in targeted advertising [32,41].
This paper introduces ContextLabel, a new method that enables practitioners to capture users' privacy profles across domains and predict their privacy concerns towards unseen data practices.Con-textLabel has two key ideas.First, rather than simplifying a realworld data practice to a limited set of exclusive categorical factors, ContextLabel asks practitioners to annotate a data practice with multiple non-exclusive labels (e.g., 'Price Discrimination', 'Absence of Consent').We hypothesize that the incorporation of a wider array of labels, capturing the nuances of the specifc data practice under consideration, will lead to a more accurate modeling of users' privacy concerns across domains.Second, ContextLabel captures users' privacy profles by asking them to express their privacy concerns for a few free-text data practices, then predicts their attitudes to unseen data practice using associated labels.We hypothesize that users' discomfort towards data practices may correlate with diferent contextual labels and personal preferences [20,42,52,54], and we want to leverage this correlation to predict users' privacy concerns towards an unseen data practice.For example, if a user strongly opposes data-driven price discrimination in airline bookings, she may also hold negative views towards another data practice involving price discrimination.
We conducted an online survey involving 38 participants over fve days to explore the feasibility of ContextLabel.We prompted participants daily to express their thoughts in free text regarding three diferent data practices and evaluate their overall comfort level quantitatively.Overall, each participant examined 13 distinct data practices.To gauge the consistency of users' privacy perceptions regarding a specifc data practice, we asked participants to evaluate one repetitive data practice and respond to questions from the Privacy Segmentation Questions [27] on the frst, third, and ffth days of the survey.
To predict users' privacy attitudes towards an unseen data practice, we assume that users' perspectives on privacy for a specifc data practice show a certain level of logical reasoning.We investigated this by assessing whether individuals maintained consistency in rating the same scenario three times at diferent intervals, and by analyzing their free-text explanations for these ratings.We found that participants' privacy attitudes are highly consistent in the span of 5 days.80% of participants expressed the same level of privacy concern in the generic privacy segmentation questions, while the remaining 20%, who were less consistent, mostly had some borderline perceptions.Further, we found that each participant mentioned nearly identical concerns in their free-text responses regarding the repeated privacy scenario across three days.
The authors then developed a codebook with 18 distinct labels and used it to annotate all 13 data practices.We then identifed contextual labels that correlated to users' privacy concerns across domains.Our results show that some users have heightened sensitivity to particular contextual labels, leading to specifc privacy concerns.Using these insights, we captured users' inclinations concerning various contextual labels by assessing their responses on several data practices and then utilized the captured preferences to predict their concerns on previously unseen data practices.Our data reveals that our predictive model improved prediction accuracy (73%) compared to the Privacy Segmentation Index (56%) and categorical contextual factors (59%).
Scope and Limitation: Our paper is an exploratory work that studies the feasibility of predicting users' privacy concerns across domains.Note the 18 labels provided are an initial set.We anticipate that subsequent research may introduce more labels by extending the codebook.A key advantage of ContextLabel is that researchers can re-label user feedback using the codebook without having to discard prior survey responses.
In this paper, our main contributions are as follows: • We empirically demonstrated the feasibility of predicting users' privacy concerns across domains.• We introduced a preliminary method to model users' privacy concerns using non-mutually exclusive labels and users' preferences.This method can predict users' privacy concerns towards unseen data practices with an accuracy that surpasses Privacy Segmentation Index and methods using categorical factors.
• We conducted a systematic study of users' privacy preferences concerning data practices and general questions over a brief period, specifcally 5 days, utilizing a two-stage data collection method.We contribute new evidence that users' preferences are largely consistent and exhibit some levels of rationality.

RELATED WORK & RESEARCH QUESTIONS
The main objective of this paper is to predict users' privacy attitudes using contextual labels and personal preferences, which originates from three fundamental premises: (1) users' privacy concerns for a specifc data practice show a certain level of logical reasoning, which can be approximated as a function of users' individual privacy preferences and the contexts of the data practice; (2) by incorporating a broader selection of non-mutually exclusive labels, we can capture more nuances of a data practice; and (3) instead of running factorial vignette experiments, it's feasible to capture users' privacy profles by analyzing their open-ended feedback on selected data practices.We have organized related studies around these assumptions.
Modeling users' privacy concerns.Many studies have investigated whether users are rational in context-specifc, privacy-related behaviors, either actual or intended [2,18,52].For example, Privacy Calculus [29] assumes that users are rational beings whose decisions and actions are propelled by their intent to optimize their benefts.When the anticipated benefts of data sharing surpass the costs, users are generally expected to willingly share their data.In contrast, numerous studies on consumer decision behavior have also shown that the decision-making process is infuenced by various cognitive biases and heuristics [3,23,50], such as availability bias [43], the framing efect [49], and confrmation bias [39].Flender and Müller put forth a contrasting proposition [16], suggesting that a decision's outcome is not settled until the moment the decision is actually made [24], and two distinct decisions cannot be deemed interchangeable within the context of decision-making [16].
We hypothesize that users' privacy concerns are consistent over a short period and exhibit some levels of rationality.We asked participants to repeatedly express their privacy concerns about a data practice and respond to Westin's Index questions to validate the hypothesis.We then analyzed if their numerical ratings were consistent across multiple responses.Further, we analyzed participants' open-ended responses to infer whether these responses demonstrate rationality.
RQ1: How much rationality we can observe from users' privacy attitudes and concerns toward a specifc data practice?
Non-mutually exclusive contextual labels v.s.categorical contextual factors.One of the most widely used theoretical frameworks in modeling users' privacy decisions is Nissenbaum's contextual integrity [38].This theory posits that privacy choices are guided by specifc information norms tied to particular contexts.Traditionally, such contexts can be delineated using certain categorical factors, as highlighted in [10,14], including actors, attributes, and transmission principles.While many studies have found that these contextual factors within a data practice signifcantly impact users' privacy attitudes [6,8,9,14,31,37,38,46], it remains hard to use these factors to predict users' privacy concerns across domain.
We hypothesize that simplifying a real-world data practice to a small set of exclusive categorical factors might overlook its intricate nuances.Instead, we want to explore a new method that annotates a data practice with multiple non-exclusive labels to improve the prediction models.
RQ2: How are diferent context labels and categorical factors correlated with users' privacy attitudes and concerns across domains?
Capturing users' preferences and predicting users' attitudes.
Many studies seek to model users' privacy attitudes across domains, using demographic [12] information like education, gender, age, and ethnicity [17,35,36,52], or personality traits [19,52].However, few studies have indicated the efectiveness of demographic predictors [52].Other studies have used the widely adopted general questions from Westin Privacy index [27] to categorize participants into three groups with diferent privacy attitudes.However, no evidence showed that either the individual questions or the derived categories are predictive of participants' reactions to specifc scenarios [52].
More recently, researchers have started to use vignette factorial surveys to profle users' privacy decisions/attitudes [7,10,31,33,35,37,47].Researchers often identify a few common factors (e.g., data types) in a specifc task setting (e.g., IoT, mobile permissions) and leverage the category factors to generate or control numerous tested scenarios.For instance, Emami-Naeini et al. conducted a 1,007-participant vignette study to capture privacy expectations of users in 380 IoT use-case scenarios [37].Liu et al. [33] analyzed privacy and security decisions of smartphone users who were asked to choose between "granting", "denying" or "requesting to be dynamically prompted" for 12 permissions of the apps they downloaded.Schechter et al. [42] conducted a study examining users' reactions to a modifed version of the Facebook Emotion Contagion Experiment [26].Serramia et al. [44] selected factors like data types, recipients and transmission principles to generate smart devices scenarios, and leveraged a collaborative fltering approach to predict user preferences.Similarly, Abdi et al. [1] implemented data mining to fnd which contexts in the Smart Home Personal Assistants ecosystem shared attributes and had the same acceptability.These studies were able to investigate users' preferences [1,26,37,44] or identify meaningful user profles [33].However, it is challenging to adjust the prediction model for a new domain [10], as the tested scenarios stem from domain-specifc factors and researchers need to collect data again for the new domain.
Instead, we aim to capture users' preferences regarding diferent contextual factors (e.g., 'Price Discrimination', 'Absence of Consent') across diverse domains.We then use these preferences to predict their attitudes towards other unseen data practices.
RQ3: How can contextual factors and personal preferences be effectively captured, and to what extent can they predict users' privacy concerns towards unseen data practices?

METHOD
We conducted a fve-day online study session on Amazon's Mechanical Turk (AMT) to collect participants' privacy attitudes towards 13 selected scenarios.We worked to deliver two main outputs: (1) using non-mutually exclusive labels to capture more nuances of a data practice, and (2) capturing users' privacy profles by asking them to express their privacy concerns to a few data practices.The outputs showed participants' privacy rationale (see section 4.1), the correlation between ContextLabel and privacy attitudes, and concern categories (see section 4.2).ContextLabel predicts privacy concerns with promising accuracy (73%) (see Section 4.3).

Survey Sessions
Survey Design.Figure 1 shows the overall survey structure.Each participant was required to engage in each day's survey for fve consecutive days.This methodology allowed us to collect privacy attitudes and concerns from the same participant across various scenarios and evaluate their consistency.Spreading the workload over fve days instead of one also ensured that participants maintained their engagement and focus throughout the study.In order to obtain accurate assessments regarding participants' privacy attitudes, we split the chosen scenarios into diferent data actions (Figure 2, left), namely data collection, data processing, data sharing, and data usage.Next, we generated fve diferent sets of questionnaire surveys for each day using scenarios described in the next section.Each set comprised three distinct scenarios from 13 cases in Table 1 and corresponding data actions are listed in Appendix C.
The main part of our surveys contained a consent page, tutorial examples, and scenario evaluation.On the tutorial page, participants were shown example answers for a scenario where an insurance company shares costumers' health data to third parties [52].Then, respondents rated their comfort level towards data actions in each scenario using a fve-item Likert scale (1 = Extremely uncomfortable, 5 = Extremely comfortable).After rating, participants were asked to express their concerns and reasons in free-form text (Figure 2, middle).To conduct efcient quantitative analysis, we forwarded the collected responses to other crowd workers for annotation of 14 predefned privacy concern categories [20].Crowd workers were asked to review the privacy concerns listed and either select the relevant options or provide additional information to indicate which concerns were expressed in the free-form text responses (Figure 2, right).
To investigate whether participants' attitudes towards the same scenario remained consistent within a short time span and to unveil any rational reasoning behind their responses, we designed a consistency test whose analysis is further discussed in section 4.1.Specifcally, we chose one scenario (Table 1, scenario 6) and integrated it into surveys of the frst, third, and ffth days.At the beginning of those three surveys, we also included three frequently used questions from Westin's Privacy Segmentation Index [27,52], asking participants to rate generic privacy related questions in the following manner: For each of the following statements, how strongly do you agree or disagree?[1: Strongly Disagree, 2: Somewhat Disagree, 3: Somewhat Agree, 4: Strongly Agree]: (1) Consumers have lost all control over how personal information is collected and used by companies.(2) Most businesses handle the personal information they collect about consumers in a proper and confdential way.(3) Existing laws and organizational practices provide a reasonable level of protection for consumer privacy today.Although Westin's index has been deemed inefcient in capturing participants' privacy attitudes in previous works [4,17,27,35,52], to our knowledge, no study has examined respondents' consistency in answering questions of Westin's index within a short time span.Therefore, we repeated these three questions together Figure 1: An overview of the fve-day survey protocol for capturing users' privacy profles by asking them to rate and express their concerns to data practices.Surveys 2 and 4 include 3 distinct stories.Surveys 1, 3, and 5 include the three questions from Westin's Privacy Segmentation Index and three stories, one of which is the repeated redundant one.The repeated scenario and questions are designed for the consistency test.12 Travel service dynamic pricing A travel agency collects users' device data to adjust the service price dynamically.
13 Ride-share dynamic pricing A ride-sharing app collects users' device battery data to adjust the service price dynamically.
with the redundant scenario for our consistency test.The responses also served as our baseline for predicting participants' attitudes and concerns in subsequent analysis.Additionally, unlike typical surveys with attention-check questions, we leveraged the free-text feld [22] and the consistency test to prevent random responses.
Story Selection.To collect participants' privacy concerns across diverse domains, we selected 13 data practices (Table 1) from Jin et al. [20].These practices covered areas such as Internet of things (IoT), e-commerce, social networks, advertising, computational psychology, data science experiments, and scenarios involving vulnerable populations.Initially, the practices involved multiple information applications, resulting in diverse outcomes.We assigned a single data application to each practice, enabling broader response collection across domains while mitigating participant fatigue.To  prevent potential biases, we distributed scenarios that might give rise to similar concerns across separate surveys.Therefore, we analyzed general concerns in each scenarios using results from Jin et al. [20].Following their work, we categorized privacy concerns into three high-level classes: respect for persons, benefcence, and justice.Then we identifed 12 data applications where concerns from one high-level category were more prevalent than from the other two.Each application was assigned to its corresponding category, and applications representing distinct categories were included in each day's survey.Among the 13 practices analyzed, we found that in one particular scenario (case 6 in Table 1), the distribution of concerns across the three categories was notably even.This suggests that users have a more diverse range of concerns in this context, potentially demonstrating varied rationales.We observed that users often exhibit specifc types of concerns in certain scenarios.For example, concerns about price discrimination are common in situations with evident unfairness in pricing (case 12 and 13 in Table 1).However, the consistency of concerns in such special cases may not extend to more diverse domains.Therefore, we selected case 6 to test for consistency, as illustrated in Figure 1.
Recruitment and Demographics.We conducted the experiment on AMT from May to July 2023.To ensure the quality of survey responses, we only recruited participants with a HIT Approval Rate greater or equal to 95% and Number of HITs Approved greater than or equal to 50, who are aged 18 or above.We also carefully designed our surveys with two pilot studies with 5 participants who were excluded from our ofcial experiment.Considering the amount of data needed, we scheduled to recruit 35 to 40 participants.We ended up recruiting 42 participants located in the United States, and 4 workers were removed from consideration due to failing to fnish all fve surveys or giving overly uniform answers to a large number of questions in a row.
On average, participants spent 30 minutes on each day's survey and received 30 USD as compensation for the fve-day session.We collected the demographic information in our frst day's survey to ensure the diversity of our participants.Among all the 38 workers providing valid answers, 24 (63%) identifed as female and 14 (37%) identifed as male.Participants' age buckets ranged from 25-34 to 65-74, with most reporting to be 36-44 years old (39%).There was also a wide range of reported educational degrees, with most reporting a 4-year Bachelor's degree as their highest degree obtained (37%).Reported income ranged from less than $10,000 to $150,000 or more, with most reporting between $10,000 and $50,000 (58%).
Ethical Considerations.Our project was approved by the IRB at our institute.Participants read and signed an informed consent document before flling out the surveys.We instructed participants to focus on their own experiences and opinions and to not reveal private or sensitive information throughout the surveys.Collected data was stored in a secure location accessed only by the research team.We only collected participants' contact emails for compensating them for their time, did not connect these emails to the rest of the study data, and deleted them after the study completion.

ContextLabel Codebook
Two authors annotated 43 data actions from 13 scenarios using multiple labels, creating a codebook to capture contextual nuances.The synthesized labels are related to privacy concerns from previous works and applicable across various scenarios, transcending specifc domains.
Previous studies have utilized Contextual Integrity [38] with fve category factors to model information fow.While not exhaustive, we annotated scenarios following this framework to cover our tested information fows; the factors used are listed in Table 3.In section 4.2 and 4.3, we compared the scenario-specifc category factors with non-exclusive labels.
Table 2: 18 selected non-exclusive labels and their defnitions used in the annotation and analysis.The labels were synthesized from previous works [7, 15, 19-21, 35, 42, 54] and were derived from the information fow process and its consequences.

Label Defnition
Absence  [35,47].We also identifed a few labels from participants' reported concerns that are related to data processing but failed to be fully captured by contextual integrity labels, such as 'Algorithmic Assessment Imperfections' [21].Besides the information fow process, studies have shown that in domains like IoT devices, the perceived benefts can signifcantly impact people's privacy attitudes [5].We hypothesized that people's privacy concerns can be infuenced by attributes associated with consequences of data actions, since they ultimately determine whether the data actions lead to tangible harm to individuals.Therefore, labels such as 'Financial Loss' and 'High Risk Probability' are included.These labels have also been frequently mentioned in privacy-related works [7,19,20,35,42,54].
Out of all the identifed labels, we selected 18 labels to evaluate their correlation with participants' privacy attitudes and concerns.The selected labels are listed in Table 2.

RESULTS
We collected 1,862 valid ratings and free-text responses for 43 distinct data actions in our surveys.The aggregated results for each scenario can be found in Appendix B. The distribution of participants' comfort or concern levels varies across diferent context labels, as depicted in Figure 3. Participants' concern categories also display variations, as shown in Figure 4. Notably, some participants (lower portion of Figure 4) exhibited an overall lower level of concern but displayed sensitivity to specifc concern categories (i.e. cells on the lower portion of Figure 4 but with warm colors).This highlights the nuanced privacy profles that may not be captured by generic indices like Westin's index.
Privacy concern refers to an expression of worry towards a specifc privacy-related situation [13].In the following sections, 'privacy attitude' refects participants' numerical concern ratings for data actions, while 'concern categories' represent the specifc types of concerns expressed by participants, as indicated in Figure 7.

RQ1: Consistency and Rationality
We examined the consistency of participants' own responses of the repeated three data actions and Westin's Index questions.While the consistency test aimed to rule out the possibility of entirely random privacy attitudes among participants, we took a further step by examining the correlation between participants' privacy concerns towards all the tested scenarios.Therefore, we validate our hypothesis that in general, people's privacy attitudes result from their own logical reasoning.Our results suggest that participants' attitudes and concerns toward privacy scenarios exhibit consistency and rationality.
Method.We designed nine questions to test participants' consistency including the repeated scenario assessment (case 6 in Table 1) and Westin's Index [27] across three surveys (see Figure 1).Since the scenario selected for the consistency test comprises three distinct data actions: data collection, processing, and usage, to prevent participant fatigue, we avoided including other cases as redundant scenarios across diferent surveys.
In each survey, we used each participant's average rating of three data actions in the redundant scenario to represent their own overall attitudes.To gauge the consistency of each participant's attitude, we computed the intraclass correlation coefcient (ICC) for their ratings across the three surveys and calculated Pearson correlation coefcients between all pairwise combinations of the surveys.We employed a similar approach to assess the consistency of their responses to Westin's Index.We also investigated whether participants expressed the same categories of concern regarding scenario 6 across three surveys to assess the consistency of their reasons for discomfort.See Appendix A.1 for additional evaluation details.
For correlation between privacy concern and attitude, we conducted a linear regression analysis on participants' average rating of data actions in the 13 scenarios using their concern categories expressed in the corresponding scenario.The value of each concern category for each participant and scenario is calculated as the sum of the corresponding concern labels in the participant's response to data actions within the specifc scenario.Then we looked into the data action level by testing if the concern categories had predictive efects on participants' privacy attitudes.To diferentiate participants' positive and negative attitudes, we split the 5-scale comfort rating for each data action into scores below 3 ('somewhat uncomfortable' and 'extremely uncomfortable') as negative and all other scores as positive.We constructed four classifcation models evaluated on 10-fold cross validation.
Results.For both Westin's questions and ratings for the tested scenario, our fndings indicate a strong alignment between users' own ratings across three tests, and their privacy concerns also remained consistent.In the broader analysis for all the scenarios, concern category showed strong correlation with and predictive efects on attitudes, suggesting the participants' rationality exhibited privacy attitudes.
Privacy attitude consistency.The Pearson coefcients indicate a strong correlation across three cases within each measurement for each participant (see Table 4).The average ICC value (Table 5) for the average Likert scores is above 0.75 for case 6 and at least 0.67 for Westin's three questions, suggesting good reliability for all four tests [25].The majority of participants' general privacy attitudes remained consistent across the three surveys.To identify outliers in the consistency test, we categorized participants' attitudes as either negative or positive, depending on whether their comfort ratings were below 3.Only eight participants exhibited varying attitudes across the three surveys, with fve of them displaying a relatively neutral stance, as their ratings fell between 2 and 4. Additionally, one of the three participants with higher variation in rating only assigned negative ratings in the second survey, but the free-text reasoning only presented positive feedback for the news-fltering system in case 6, such as "weeding out anything [they don't] want to see" with similar responses from the other two surveys, so we consider a miss-rating for this case.The analyses of the responses from the only two exceptions are in the following section.
Privacy concern and reasoning consistency.Among all participants, 80.26% of labeled concern categories in scenario 6 remained consistent across all three surveys, suggesting the majority of participants' privacy concerns are consistent over the three surveys.Since concerns expressed towards the scenario varied among individuals, we chose one outlier mentioned in last section and another participant with more consistent concern patterns and visualized their concern categories in Figure 5.The majority of participants demonstrated consistent reasoning across the three surveys, regardless of whether they had negative or positive privacy attitudes.For those who expressed concerns, the most frequently reported issues centered around the lack of trust in algorithms and the lack of control over personal data.The Venn diagram on the right side of Figure 5 presents one typical concern pattern of the majority.On the other hand, participants with more positive perceptions of the scenario constantly referred to it as a "common practice" or "providing benefts for users".
Participants with neutral attitudes exhibited more complex considerations.Two participants (see Figure 5, left) with inconsistent attitudes varied in their beneft assessment across the surveys.Both participants treated the scenario as common practice for Internet companies and mentioned "improving browsing experience" in one survey but expressed the desire for more initiative in another survey.However, for those outliers, their own detailed reasoning often covered consistent themes across three surveys.For instance, despite expressing diferent attitudes in two surveys, in the third  survey, participant P1 described news fltering as "able to save time" but "I don't like a third party hiding content from people or businesses that I am willingly following, which I decided to follow because I want to see those updates".This suggests similar factors to assess the same privacy context within a short time frame.This participant's Venn diagram (Figure 5, left) also shows a certain degree of consistency, as there is only one concern category in the non-intersecting area.
Privacy concern and privacy attitude rationale.Apart from the shared concerns among participants with negative attitudes towards scenario 6, Figure 5 illustrates the average rating tends to decrease as participants express more concern categories in each survey, corroborating our regression fndings in Table 6.Most concern categories negatively afect participants' ratings across all scenarios, indicating their role in explaining participants' attitudes towards specifc contexts.This correlation is further confrmed by the promising predictive efect (with prediction accuracy of 87% ) of concerns on privacy attitudes, as demonstrated in Table 7.We also introduced data action types (data collection, processing, sharing, and usage) as new category variables into the prediction model, but it did not signifcantly alter the model's performance compared to the models in Table 7.This suggests that concern categories' predictive efect is broad and not limited to specifc contextual actions.
Additionally, the regression coefcient in Table 6 shows that concern categories exhibit varying degrees of infuence, with categories like 'Bias or Discrimination' having notably higher coefcients, indicating their stronger impact on participants' attitudes.

RQ2: Correlation between ContextLabel and Privacy Concerns
To validate if ContextLabel can efectively capture the essence of privacy contexts, we analyzed labels' correlation with participants' privacy attitudes and concern categories.Our results suggest Con-textLabel exhibits stronger correlations with participants' comfort ratings and concern categories compared to generic privacy index and category factors.
Method.To measure the correlation between ContextLabel and comfort rating for data actions, we calculated their Pearson correlation coefcient and Kendall rank correlation in all scenarios as shown in Table 8.The Pearson correlation is calculated as Pointbiserial correlation, a special case of the Pearson Correlation to measure the relationship between continuous variables and dichotomous variables.For comparison, we annotated the labels associated with Contextual Integrity elements and added the sender parameter.
We also computed the Pearson correlation and Kendall rank correlation using responses to Westin's three questions as shown in Table 9.To address individual diferences, we assessed the frequency of a ContextLabel appearing among the top fve labels with the highest Kendall rank correlation to individual comfort ratings, as depicted in Figure 6.
For the correlation between ContextLabel and concern categories, we defned a concern score to gauge the extent to which labels contribute to general levels of privacy concern.See Appendix A.2 for more evaluation details. Figure 7 shows the overall results.We also calculated the odds ratio between each label and the expressed concern categories for each user, indicating the strength of correlation in an individual's profle.Sets of concern-label pairs with odds ratios greater than 10 and signifcance levels below 0.05 were identifed.The occurrence of each set across all user cases was then counted, refecting the transferability of specifc concerns across scenarios with that label.Figure 8 displays the top 20 sets with the highest occurrences.
In section 4.2 and 4.3, for the redundant scenario 6 included in three surveys, we only used the responses from the third day's survey since section 4.1 already showed the responses to be consistent across the surveys.
Results.Overall, non-exclusive labels like 'High Risk Signifcance', 'Price Discrimination', and 'Financial Loss' demonstrated stronger correlations with participants' comfort ratings and concern categories compared to exclusive categories factors and the Privacy Segmentation Index.This underscores the efectiveness of Context-Label in capturing crucial aspects of diverse privacy contexts and representing individuals' perceptions.
Correlations between ContextLabel & Comfort Score.People exhibit varying sensitivities to diferent labels, but some nonexclusive ContextLabels have a noticeable impact on the majority's privacy attitudes.
The correlation between ratings and Westin's question (Table 9) is notably weaker compared to that of context labels (Table 8).However, many Pearson coefcient values for the labels do not indicate a strong correlation with ratings.Compared with the predictive efect of concern categories on individual's attitudes (Table 7), individual variance in concern categories towards the same ContextLabel could explain this.For instance, the Pearson correlation between 'Price Discrimination' and comfort rating ranges from -0.651 to 0.007 among participants, with Kendall correlations ranging from -0.595 to 0.015.Despite the individual variation, non-exclusive labels like 'Unexpected Use', 'High Risk Signifcance', 'Price Discrimination', and 'Financial Loss' demonstrated stronger correlations with  participants' comfort ratings than category factors from Contextual Integrity.These labels were also frequently infuential to individuals' comfort levels (see Figure 6), and they received lower average ratings (Figure 3).Labels with notable impacts are discussed in the following section.

Correlations between ContextLabel & Concern Categories.
Though privacy concerns towards the same ContextLabel vary among individuals, ContextLabel is able to capture more transferable concern categories than category factors and the Privacy Segmentation Index.Figure 7 reveals variations in how participants associate labels with specifc concern categories on average.Contrarily, Figure 8 illustrates the concern-label sets that show strong correlation in individual participants' data.Despite diverse individual concerns shown in Figure 4, Figure 8 shows approximately 50% of participants closely link 'Bias or Discrimination' to six ContextLabels.Labels  like 'Third party transfer' and 'Data Breach' also connect to concern potential in pinpointing primary sources of concern.Moreover, categories, aligning with Figure 7.This suggests ContextLabel's Figure 8 underscores that category factors alone are inadequate for capturing transferable concerns, since concern categories exhibit signifcant correlations with non-exclusive labels such as 'Financial Loss' and 'Unexpected Use', but not collected information type or sender type.Among those individuals who expressed 'Bias or Discrimination' concern in various scenarios, their responses to each question in Westin's index in our survey spanned the entire range of scales from 1 to 4 without any discernible specifc patterns, indicating that the general criteria used in Westin's index failed to adequately capture the concerns of participants within specifc contextual scenarios.
Infuential labels.While Figure 8 only shows the top 20 of 252 concern-label sets, we identifed a total of 111 sets with high odds ratio in individuals' profles, of which the 'High Risk Signifcance', 'Algorithmic Assessment Imperfections', 'Empathy for the Vulnerable', 'Financial Loss', 'Opportunity Loss', and 'Unexpected Use' labels show the highest frequency, each of them appearing in nine or ten sets.Among those labels, 'High Risk signifcance', 'Financial Loss', 'Price Discrimination', and 'Unexpected Use' are also closely related with participants' discomfort based on results from Table 8 and Figure 3. Notably, those are labels that determine whether the data actions lead to tangible harm or pose threats to individuals.This validates our hypothesis that these types of labels are very likely to arouse concern and therefore infuence privacy attitudes.In addition, participants showed less concern for 'Opportunity Loss' and 'Reputation Loss, ' compared to 'Financial Loss', though all lead to potential harm.This disparity suggests that individuals prioritize tangible harms, such as 'Financial Loss', over more abstract or latent consequences like reputation or opportunity loss.In contrast, most category factors, such as the attributes (i.e., collected data types) in Contextual Integrity frame, did not display signifcant correlations with privacy concerns.However, labels synthesized from Contextual Integrity, like 'Third Party Transfer, ' exhibited correlations with specifc concern categories.This implies that non-exclusive Con-textLabels are more profcient at capturing the aspects of privacy contexts that genuinely concern people.

RQ3: Prediction Modeling
Participants' rationality for privacy attitudes (RQ1) and the efectiveness of ContextLabel in capturing dominant concern categories (RQ2) suggest ContextLabel's potential predictive efects on participants' privacy attitudes towards unseen data actions.The assumption is supported by the results in this section.
Method.In this study, we framed both predicting concern categories and privacy attitudes as classifcation tasks.We built the ContextLabel prediction model using 18 labels and examined the predictive efect of ContextLabel on individuals' concern and privacy attitudes.We adopted Contextual Integrity category factors (see Table 3) and Westin's Privacy Segmentation Index as our baselines.As Westin's index has been found to be inefective in predicting users' privacy contextual attitudes or concerns [52], we used each participant's average scores in three tests for Westin's three questions to build the prediction model.For ContextLabel, we trained a Naive Bayes classifer which takes ContextLabels as input and predicts whether a particular user has certain concern categories.We evaluated the model predictions with leave-one-out cross validation (LOOCV).To explore privacy attitude prediction, we used the same threshold as Table 7 in RQ1 to diferentiate positive and negative attitudes, thus making attitude prediction a binary classifcation.We built a neural network in the form of two-layer multi-layer perceptrons (MLP) to model participants' decision process.To test the prediction efect on novel scenarios (i.e.data actions with novel label combinations), we evaluated models using crossvalidation where in each fold, data actions serving as the test set were excluded from the training set.All the models were built using the Scikit-learn package.
Results.Combined with personal preferences, ContextLabel shows overall better predictive efects on privacy attitude and concern categories than category factors and Segmentation Index.
Concern category prediction towards unseen data action.In our survey, we only considered prominent categories, requiring consensus from at least two of three label workers for each free-text response.This resulted in sparse individual-level concern distribution and overall high prediction accuracy of models.To gauge model performance, we emphasize recall scores in Table 10, focusing on the model's ability to identify existing concerns.Figure 9 displays model performance across all participants.Notably, ContextLabel outperforms the other models, especially in categories such as 'Bias or discrimination, ' 'Unexpectation, ' and 'Invasive monitoring,' which are among the top 5 most expressed concerns.The low average recall score is attributed to sparse data arising from infrequent expressions or divergent decisions among crowd workers for specifc concerns, such as 'Lack of Informed Consent, ' 'Lack of Respect for Autonomy, ' and 'No Control.' Notably, the 'No Control' category had 652 annotations, with the majority (64.5%) contributed by a single worker, leading to less than 40% of retained concern labels and consequently lower recall scores.
Privacy attitude prediction towards unseen data practice.Table 11 displays the model performances.The results suggest that for predictions on the individual level, the ContextLabel model signifcantly improved overall attitude prediction accuracy (73%) than category factors (59%) and Privacy Segmentation Index (56%).Notably, there is also an increase in the recall and F1-score, which is around 20% higher than those achieved by the Contextual Integrity model trained without individual preference specifcation.This indicates a noteworthy predictive efect on people's privacy attitudes towards unfamiliar scenarios when combining both contextual information and personal preferences.For the ContextLabel model trained on individual data, the top three mispredicted data actions all belong to cases annotated with only four context labels, lacking infuential labels such as 'Empathy for the Vulnerable', 'Financial Loss', 'High Risk Signifcance', 'Price Discrimination', and 'Unexpected Use', as illustrated in Figure 6.Furthermore, these actions are either categorized as data collection or processing, but they do not represent the fnal actions that directly lead to consequences.Participants tend to exhibit diverse attitudes toward these actions.For those three cases, approximately 50% express positive sentiments and the remaining 50% express negative ones.

CROSS-CHECKING WITH EXISTING DATASET
Before carrying out our surveys, we analyzed the available survey results from Shvartzshnaider et al. 's study [47] to validate the predictability and similarity of users' privacy attitudes in similar scenarios.Table 12 presents the average accuracies of binary logistic models, SVM, and k-nearest neighbors classifers using Leave-One-Out Cross-Validation (LOOCV) on the dataset from Shvartzshnaider et al. 's study [47].
The user models achieved an average accuracy of 71.04%.Despite variations in scenario descriptions, they share an educational context and limited contextual integrity labels, indicating their  The observed accuracies suggest that users' privacy preferences can be applied across comparable contexts and used to predict their attitudes in similar scenarios.These fndings align with our observations that users' privacy attitudes are infuenced by their rational reasoning and potential contextual predictors can be employed to model concerns and predict attitudes.

DISCUSSION
Our results illustrate the diversity of individuals' privacy attitudes and concern categories across various contexts.Nonetheless, individuals consistently apply their own logic and exhibit relatively stable reasoning for their expressed privacy attitudes.This provides opportunities to model their decision-making processes by identifying the factors that raise their awareness when they encounter privacy issues.We identifed several non-exclusive ContextLabels and compared them with category factors and generic indices like Westin's Index.ContextLabel proves to have stronger correlations with individuals' privacy concerns and attitudes towards specifc contexts.While not exhaustive, our non-exclusive labels efectively predict people's privacy attitudes, demonstrating their predictive feasibility in real-world scenarios.
User rationality behind privacy attitudes.Participants' consistency of their expressed attitudes toward the same contexts and the strong correlation between their comfort rating and concern categories demonstrate their rationality behind the privacy attitudes.Even for concern categories that do not typically appear in participants' responses (see the right portion of Figure 4), specifc participants still pay special attention to them.For instance, one participant who reported 'Lack of Alternative Choice' the most expressed this as a reason for the discomfort in three specifc scenarios.When explaining the reasons for the high comfort score in other scenarios, however, the same participant consistently used a similar expression, emphasizing the importance of "having the choice to opt out".
ContextLabel associated with risks and beneft assessment.
Our results align with the assumption from previous works that users' decisions and actions are propelled by their intent to optimize their benefts [29].We found that labels associated with the fnal outcomes and tangible harms (e.g., 'Financial Loss') rather than information collection or processing have stronger correlation with

LIMITATIONS AND FUTURE WORK
While our experiments have shown participants' consistency in a short time frame, future research should consider testing over a longer duration to gain a more comprehensive understanding of people's rationality in privacy scenarios.Furthermore, many studies have been done on the privacy paradox [18] to investigate the inconsistency between behavior and intent.The insights into rationality from our study can assist in modeling people's privacy attitudes, but there is still a need for extensive research to fully grasp the intricacies of the privacy paradox.Users' decisions are infuenced by various biases, so our predictions should be used with caution in certain policymaking situations.We defned 18 labels to test their correlation with people's privacy attitude and concerns.A wider range of labels could be identifed and used in a future study to provide a more comprehensive method to model users' perception and decision processes in privacy scenarios.Researchers could give labels more fne-grained attributes or scores to capture more nuances of contexts, leading to more accurate concern predictions.Additionally, though the annotation in our study was completed by two professional annotators, the heuristic-like annotation process could be further explored leveraging large language models.
Our paper is an exploratory work that studies the feasibility of predicting users' privacy concerns across domains.Future research could expand by incorporating additional scenarios to enhance the generalization performance of the prediction model.While our survey design, including features like free-text explanation and the consistency test, addresses concerns related to crowd workers' inattention on AMT, further validation of ContextLabelcould involve a more diverse selection of participants from various platforms in future research.

CONCLUSION
We presented ContextLabel, a novel method for capturing users' privacy profles across domains and predicting their privacy attitudes towards unseen data practices.By incorporating non-exclusive labels and users' preferences, ContextLabel ofers a more accurate modeling of privacy concerns compared to categorical factors and generic index.The results of our empirical study involving 38 participants over fve days demonstrated the feasibility of predicting users' privacy concerns across domains.We observed consistent privacy attitudes among participants and identifed contextual labels that correlated with users' privacy concerns.Leveraging these insights, we built a predictive model which achieved a higher accuracy (73%) compared to the Privacy Segmentation Index (56%) and categorical contextual factors (59%).
Table 13: Aggregated survey result.Average scores are the average ratings of all data actions in each scenario, and participants' concerns are the total number of concerns from 14 concern categories labeled from all free text responses.ContextLabel are the number of ContextLabel we annotated for all data actions in each scenario, using the codebook with label defnitions in Table 1  Data Collection A retail store ofers the user a free loyalty card, associating all purchases with her unique ID.

Data Sharing
The retail store sells the user's data to a health insurance company.

Data Processing
The health insurance company analyzes the user's purchases and concludes she has a sedentary lifestyle and an unhealthy diet.

Data Usage
The insurance company raises the user's insurance rates.
Figure 11: A privacy storyboard of "Loyalty card in a retail store".A retail store collects users' data through a loyalty card and uses the data for insurance and coupon personalization.
Data Collection An e-commerce company installs cameras to record customers' behaviors inside a retail store, to make it checkout-free.

Data Processing
The company develops algorithms to identify users and track items they place in their baskets.

Data Usage
The company plans to ofer automated pricing.Prices will fuctuate in real-time based on demand.
Figure 12: A privacy storyboard of "Checkout-free retail store".An e-commerce company opens a checkout-free retail store by installing various sensors inside a physical store.

Data Collection
A game company records all of its users' in-game chat logs to support further analysis.

Data Processing
The company develops algorithms to identify abusive language usage.

Data Sharing
The company sells the data to several stafng agencies, with personally identifable information.

Data Usage
The stafng agencies use the in-game reputation scores to rank and flter their job candidates.
Figure 13: A privacy storyboard of "Game chat log".An online game company uses its chat logs to identify potential problems in the workplace.

Data Processing Data Collection
The app aggregates Data Usage A pregnancy the data of Data Sharing The employer uses tracking app collects employees in the The app developers that data to users' menstrual same company and sell aggregated minimize healthcare data and establishes removes personally health data to spending and better a partnership with identifable various employees.plan human many companies.
Figure 14: A privacy storyboard of "Pregnancy intimate data".A pregnancy app shares users' intimate body data with their employers.

Data Collection
An online dating app collects information about their users, including demographic information and various behavioral data.
Data Usage For one day, the app hides all profle photos to study the new social interactions.
Figure 15: A privacy storyboard of "Data science experiments in a dating app".An online dating app conducts several experiments to understand the nature of romance.

Data Collection A company records users'
sent/received/drafted emails and email contacts in its email service.

Data Sharing
The email team shares users' email data with a newly launched social network service.

Data Usage
When a new email user joins the social network, the user will follow her/his email contacts automatically.
Figure 16: A privacy storyboard of "Email contacts for social network bootstrapping".A technology company appropriates users' email data to bootstrap a new social network service.

Data Collection
A ftness tracker captures users' daily health data (e.g.steps, heart rate).Users can also log activities manually.

Data Processing
The company develops algorithms to analyze the data and predicts metabolism activities (type, duration, and intensity).

Data Sharing
The company ofers a social feature where everyone can share their activities.
By default, these profles are public to search engines.

Data Usage Data Collection Data Processing
The company sends out When a user makes The company develops tailored coupons to these purchases at a retail store algorithms to predict user parents-to-be.The (online/ofine), the traits (e.g.pregnancy) underlying goal is to company collects various using the purchase re-shape these users' behavior data.
history.shopping habits.

Data Collection
An insurer asks its users to submit images of their photo IDs and complete a series of tasks to register their faces.

Data Processing
The insurer analyzes users' face images to gauge customers' health (e.g.BMI).
Data Usage Policyholders get discounts on their monthly premiums based on how much body fat they have, as calculated by the scan.

Data Collection Data Processing An online travel agency
The agency aggregates Data Usage collects users' data (e.g. the purchase behavior The agency decides to operating system, device and fnds that users using incorporate device-based type) when users search diferent devices are price discrimination to and book fights on their willing to pay diferent their system.service.
amounts of money.Data Collection A ride-sharing mobile app collects users' phone information (e.g.device models, battery information) whenever users use the app.

Data Processing
The company aggregates the battery information and discovers that users are more likely to pay for a higher price if their batter is low.

Data Usage
The company decides to incorporate battery-based price surging to their app.
agency collects users' data (e.g., operating system,device type) when users search and book flights in their service Survey Tasks for crowd workers 1 Examine data action descriptions by rating and free text Label Tasks for crowd workers 2 Label free text responses from workers 1

Figure 2 :
Figure 2: A survey example.Privacy scenario (left) is split into data actions and organized in a information fow.Crowd workers were involved in two diferent tasks: we frst asked crowd workers to examine data action descriptions by rating and writing free text (middle), then we forwarded the collected free-text responses to another group of workers for privacy concerns annotation (right).

PFigure 3 :Figure 4 :
Figure 3: Average participant distribution across diferent comfort levels for each label.From left to right, labels are ranked in increasing order of average comfort score.A lower bar represents a lower proportion of participants expressing a positive attitude towards data actions with the label.

Figure 5 :
Figure 5: Venn diagram of examples of two participants with low or high concern consistency.Concern categories absent in the circles represent the ones unexpressed by the participant across all three surveys.

Figure 6 :
Figure6: Infuential labels for individuals' comfort level.When a label ranks in the top fve based on Kendall rank correlation coefcients with individual user's comfort ratings among all labels, and the p value is less than 0.05, then it is considered infuential.The y-axis represents the frequency of labels being labeled as infuential across all users' cases.
U n e x p e c ta ti o n B ia s o r d is c ri m in a ti o n N o c o n tr o l In v a s iv e m o n it o ri n g L a c k o f in fo rm e d c o n s e n t H ig h ri s k s L a c k o f re s p e c t fo r a u to n o m y L a c k o f tr u s t fo r a lg o ri th m s D e c e p ti o n In s u ff ic ie n t d a ta s e c u ri ty D a ta c o m m o d if ic a ti o n L a c k o f p ro te c ti o n fo r th e v u ln e rb le In s u ff ic ie n t a n o n y m iz a ti o n L a c k o f a n a lt e rn a ti v e c h o ic e

Figure 7 :Figure 8 :
Figure7: The average number of participants who expressed a specifc concern category for each label.Concerns are ordered left to right by the sum of each column.We excluded the sender type from the Contextual Integrity factors as it did not exhibit high odds ratios with concern categories as labels in Figure8(odds ratio > 10 with p value < 0.05), suggesting weak correlation.

LFigure 9 :
Figure 9: The average recall score of all participants' expressed concern on each concern category and categories are ordered left to right by the recall score, in decreasing order.ContextLabel shows signifcantly higher score in most concern categories.

Figure 10 :
Figure 10: A privacy storyboard of "Search engine clickthrough data".A company records users' clickthrough behavior in an A/B test experiment nonanonymously and uses the data for advertising and search personalization.

Figure 17 :
Figure 17: A privacy storyboard of "Fitness tracking".A wearable technology company collects users' intimate behavior data and makes them public by default.

Figure 18 :
Figure18:A privacy storyboard of "Retail store pregnancy".A retail store predicts users' pregnancy status by analyzing their purchase history.

Figure 19 :
Figure 19: A privacy storyboard of "Insurer employs AI".An insurance company uses facial-recognition technology to identify untrustworthy and unproftable customers.

Figure 20 :
Figure 20: A privacy storyboard of "Dynamic pricing".Technology companies collect users' behavior data to adjust the service price dynamically.

Figure 21 :
Figure 21: A privacy storyboard of "Dynamic pricing".Technology companies collect users' behavior data to adjust the service price dynamically.

Table 1 :
Scenarios used in surveys to gauge privacy attitudes.The split data actions are listed in Appendix C. Case 6 is the scenario used as the redundant scenario in the consistency test.

Table 3 :
Contextual Integrity Factors used in the scenario annotation.The subject element, which is "users" in all tested scenarios, is not listed.The recipient and transmission principle factors are categorized into two categories each.In Table8, two binary variables (i.e.'Third Party Transfer' and 'Absence of Consent') are used to represent those two factors.

Table 4 :
Pearson's correlation of average ratings of the redundant scenario (Case 6) and three questions from Westin's Index (WQ) between surveys from all participants.All correlations are signifcant at the 0.01 level.

Table 5 :
Intraclass Correlation Coefcient (ICC) of users' average ratings of the redundant scenario (Case 6) and three questions from Westin's Index (WQ).Results are all signifcant with p-values at 0.001 level.

Table 6 :
Linear regression on the average of To represent participants' attitudes toward each scenario, the evaluation pertains to data at the complete scenario level rather than split data actions.The value of each concern category is calculated as the sum of the corresponding concern labels in each participant's response to data actions within the specifc scenario, and used as continuous variable.The reported coefcients are unstandardized.*indicates < .05statistical signifcance and ** indicates < .001.

Table 7 :
Accuracy, F1 score, and recall of Logistic regression (Logistic), Support Vector Classifcation (SVC), AdaBoost classifer (AdaBoost) and k-nearest neighbors classifer (KNN) on attitude prediction using concern categories (binary variables).The average results of 10 folds are reported with standard deviation.The evaluation pertains to data at the data action level.

Table 8 :
Pearson correlation (calculated as Point-biserial correlation) and Kendall rank correlation between ContextLabel and comfort rating.Labels are treated as binary variables and ratings as continuous.Labels marked with CI are Contextual Integrity factors (see detailed defnition in Table3).

Table 9 :
Pearson correlation and Kendall rank correlation between participants' responses to three questions from Westin's Index (WQ) and their comfort rating towards data actions.

Table 10 :
Average Recall and Average Accuracy with standard devision in the parenthesis.The reported results are the average value across all participants' profle.

Table 11 :
The average prediction accuracy, recall and F1-score of the MLP models, with the thresholds of 3 to diferentiate the positive and negative attitude.Individual scope suggests the models are trained and evaluated on individual users' data.All users scope refers to the cross-validation where the models were trained on all users' data, and each story served as the test set in each fold.

Table 12 :
Average accuracy of three types of prediction models across all users in each survey from a separate study .