Assessing User Apprehensions About Mixed Reality Artifacts and Applications: The Mixed Reality Concerns (MRC) Questionnaire

Current research in Mixed Reality (MR) presents a wide range of novel use cases for blending virtual elements with the real world. This yet-to-be-ubiquitous technology challenges how users currently work and interact with digital content. While offering many potential advantages, MR technologies introduce new security, safety, and privacy challenges. Thus, it is relevant to understand users’ apprehensions towards MR technologies, ranging from security concerns to social acceptance. To address this challenge, we present the Mixed Reality Concerns (MRC) Questionnaire, designed to assess users’ concerns towards MR artifacts and applications systematically. The development followed a structured process considering previous work, expert interviews, iterative refinements, and confirmatory tests to analytically validate the questionnaire. The MRC Questionnaire offers a new method of assessing users’ critical opinions to compare and assess novel MR artifacts and applications regarding security, privacy, social implications, and trust.


INTRODUCTION
Mixed Reality (MR) [70] research is a growing field covering a broad spectrum of technologies and applications that blur the boundaries between digital and real worlds.Considering the evolution of MR over the past years, we observed that many innovations have primarily brought incremental improvements to MR technologies.As a consequence, MR devices become more commonly available through smartphones [50] and even more interwoven by using head-mounted displays [45,54].Fueled by the commercial success of the Microsoft HoloLens 2 1 in industry settings and further expectations towards the Apple Vision Pro 2 in consumer use, MR might become omnipresent soon.
The transition from fiction to reality has steadily progressed in recent decades with the continued research in this field and the emergence of commercially available MR products.Previous research has extensively investigated use cases (e.g., in the context of work [56] or education [35]) as these technologies become increasingly accessible.At the same time, evaluating their usability and potential benefits is essential, as well as understanding the concerns and apprehensions that MR devices raise with their integration into our lives.Existing issues related to hardware performance, software optimization, and interaction design tend to improve over time as computational power increases and hardware shrinks.As a result, technical challenges that currently hinder seamless MR experiences will likely diminish over time.Yet, it is essential to recognize that the evolution of MR is not solely a matter of technological advancement.The challenge lies in addressing individuals' potential apprehensiveness about the technology.

RELATED WORK
With the rapid advancements in MR technology, understanding users' apprehensions about MR technology is crucial for its successful integration into everyday life.MR has shown tremendous potential in various domains, but its widespread adoption is impeded by several challenges that need to be addressed for it to become a mainstream technology [35].By giving an overview of current research about novel challenges in MR, we aim to provide a comprehensive backdrop against which user concerns can be effectively evaluated in the later sections.

Social Acceptance and Social Implications:
Challenges to the Ubiquity of MR One of the critical barriers to the widespread acceptance of MR is the lack of social acceptance.A 2021 study by Thomas et al. [71] sheds light on the barriers to social acceptance surrounding MR devices.Despite the functional benefits of MR technology, the study reveals that certain facts genuinely worry everyday users.One of the primary barriers is the perceived social awkwardness associated with wearing MR devices in public, which can lead to feelings of self-consciousness and reluctance to embrace this technology.
Moreover, the study mentions that the appearance and design of MR devices are critical factors influencing social acceptance, as aesthetically unappealing or intrusive devices may deter individuals from incorporating them into their daily lives.To foster broader social acceptance of MR, the study emphasizes the importance of improving the functionality and user experience and addressing these social and psychological barriers to ensure MR devices become seamlessly integrated into society's fabric.Slater et al. [69] determined a number of ethical considerations that ought to be considered in the future development of MR technologies.Next to common privacy concerns due to the vast amount of data collected by MR devices (further discussed in Section 2.2), the publication illustrates how highly realistic VR and AR environments can impact users emotionally, psychologically, and socially.These impacts include but are not limited to the ubiquity of MR, akin to mobile technology, as it can impede meaningful real-world interactions, potentially resulting in social isolation.This shift towards MR may also cultivate a preference for virtual interactions over real-life ones, leading to societal withdrawal.Moreover, the potential "superrealism" of MR experiences may lead some individuals to neglect their physical well-being, paralleling extreme cases of excessive video game usage where the boundary between the virtual and physical worlds blurs.Immersive MR environments can also encourage imitative behaviors that individuals would typically avoid in reality, either through gradual exposure or emulation of actions taken by virtual characters.The persuasive power of MR, particularly in highly realistic iterations, raises ethical concerns when employed to modify emotions and behaviors for potentially harmful ends.Furthermore, this capacity to manipulate sensory experiences raises questions about the reliability of sensory evidence in both legal and societal contexts.

Security, Safety, and Privacy: Common
Threats in a New Environment According to Gugenheimer et al. [27], while a significant portion of research focuses on technological advancements in MR, it is equally crucial to emphasize research into the potential hazards and challenges that accompany these innovations.They determined the well-established topics of security, safety, and privacy in general computer science to be relevant for the MR research.These aspects become more important since they proliferate into other research areas for wider adoption, including MR support at production lines [6,58], education [23,47], or transportation [41,42,49] while changing the perception and interaction capacities of users [22,67,70].With such growth in MR, privacy concerns encompass two main viewpoints: that of the user and that of bystanders.User-related privacy issues revolve around the risks associated with biometric identification or surveillance of behavior and attention.In contrast, bystander privacy concerns how MR sensors, such as cameras, may impact individuals who did not consent to be observed by the technology [14].
In the context of trust, Jian et al. [38] discuss the increasing prevalence of automation in complex systems and everyday life.The authors review existing research on measuring trust in various contexts, such as social psychology and human-machine systems, highlighting the multidimensional nature of trust and the need for a more empirical understanding.Furthermore, the authors identify and scrutinize previous studies, including the lack of differentiation between trust and distrust, and emphasize the importance of assessing trust in the context of human-machine systems, leading to the necessity for the development of an empirically based tool for assessing trust in increasingly automated environments.
Further, Harborth and Pape [30] also report that technical assessments of risks related to MR reveal that the technology introduces new privacy concerns that require immediate attention.Individuals using MR genuinely worry about their privacy, and these apprehensions significantly deter technology adoption.The study highlights the importance of addressing these privacy risks promptly and effectively to foster trust and confidence among users.
A unique aspect emerging in MR research is "immersive attacks, " [1,8,76] which target users' physiological and psychological safety through perceptual manipulation rather than exploiting hardware or software vulnerabilities.These attacks leverage perceptional illusions and necessitate the development of protective layers to detect and prevent such manipulations, highlighting the distinctive challenges posed by MR technology.
Lastly, safety and health concerns are yet another barrier that must be addressed to facilitate the broader adoption of MR.Yuntao Guo et al. [28] reported on the safety and health concerns associated with location-based MR gaming applications.As these games blur the lines between virtual and physical environments, potential risks and hazards emerge that can impact players' well-being.The study mentions that one primary concern is the distraction factor, where players may become engrossed in the game and fail to pay adequate attention to their surroundings, leading to accidents or injuries.Additionally, prolonged usage of MR gaming apps can result in physical strain, eye discomfort, and even musculoskeletal issues, especially when players engage in prolonged or repetitive gameplay [43].The study emphasizes the importance of understanding these safety and health implications, particularly for game developers and policymakers, to implement safety measures, provide user guidelines, and raise players' awareness of the responsible use of location-based MR gaming apps.

Related Questionnaires
Next to the objective key challenges that pertain to MR, acquiring the feedback of users is invariably a crucial part of the development of new technologies, be it in the field of MR or elsewhere.To this end, numerous questionnaires and scales have been developed to assess various aspects of user experiences within this domain.However, it is essential to note that these existing questionnaires often focus on specific dimensions of user perceptions and do not comprehensively address the diverse spectrum of concerns that may arise.This section briefly overviews these related questionnaires, highlighting their strengths and limitations.
One of the most widely known measures of user acceptance of technology is the Technology Acceptance Model (TAM), developed by Fred Davis in the 1980s [17,18].It aims to measure acceptance by determining both the ease of use and the perceived usefulness of a technological system.The TAM has been further developed [72,73], and other publications aimed at extending the model by adding further factors, such as perceived enjoyment [51,63].
Notable is also the Attitudes toward Virtual Reality Technology Scale (AVRTS) [5], using the TAM as an initial model to then further develop a scale to assess attitudes towards VR technologies.All in all, the TAM, its variants, the AVRTS, and other commonly used scales in HCI research like the System Usability Scale (SUS) [26], the AttrakDiff [31], or the User Experience Questionnaire (UEQ) [66] is based on assessing the acceptance, the general usability, the hedonic and pragmatic qualities, and the general user experience respectively.While these scales excel at evaluating usability and gauging user affinity for a particular artifact, their design does not prioritize the measurement of concerns or unfavorable opinions regarding those devices.
The Perceived Creepiness Technology Scale (PCTS) [78] stands out in this respect as it specifically seeks to evaluate an adverse emotion.The primary purpose of the PCTS is to allow designers and researchers to quickly assess new technologies that might elicit initial sensations of creepiness in users in that regard.
Next to the AVRTS, scales like the Augmented Reality Immersion (ARI) questionnaire [25] and various presence questionnaires [62,77] seek to ensure that the measurements are relevant and accurate when applied to MR use cases, necessitating the development of novel questionnaires tailored to these technologies.The Virtual reality sickness questionnaire (VRSQ) [44] and the Augmented Reality Sickness Questionnaire (ARSQ) [36] aim to measure the immediate negative impact of MR on the users' well-being, but to the authors' best knowledge, no scales exist that aim to determine the long-term effect of MR on its users.
Remotely related is the Concerns-Based Adoption Model (CBAM) with its Stages of Concern Questionnaire (SoCQ) [24], an educational framework developed in the late 1970s.It is designed to understand and facilitate the process of educational innovation and change, particularly in the context of school settings.Although the questionnaire may not be suitable for assessing concerns related to MR technology and its users, the stages it outlines provide valuable insights into how individuals perceive innovations and their potential reactions to them.

CONCEPTUAL FRAMEWORK: CATEGORIZING CONCERNS ABOUT MR
Based on the findings of Section 2, a preliminary conceptual framework was developed to categorize potential user concerns about MR systems.As this classification is derived from related literature, it can logically only serve as a framework for classifying the ongoing research within this domain.Acknowledging that such categorizations may not always align with users' subjective concerns or considerations is essential.Hence, this only represents an initial basis from which the subsequent construction of the scale could proceed as further explained in Section 4.
The decision to develop a preliminary conceptual framework for generating the questionnaire items rather than to base it on psychological models, such as the Innovation Resistance Theory (IRT) [60], was driven by the recognition that possible concerns regarding MR might extend beyond the generic barriers that are often defined for novel technologies or innovations as a whole.Herein, contemporary issues such as privacy, which are crucial in the field of MR, are often only implicitly addressed in existing  models, if at all.Hence, deriving potential concerns from currently recognized challenges in MR was deemed more fitting, ensuring that the questionnaire reflects the nuanced research field of MR and addresses issues that may not be adequately captured by existing psychological models.

Security and Privacy: Contrasting, yet not Mutually Exclusive
The categorization of security threats in MR is based on the publication "Security and Privacy Approaches in Mixed Reality: A Literature Survey" [19].It compiles various strategies suggested to maintain the security and privacy of users and data within the realm of MR in previous work.Furthermore, the researchers combined the already existing security and privacy properties from previous work [20,34,40] for a final scale consisting of six security-related properties and six privacy-related properties on each end, with one property being related to both.They observed that specific security attributes may be simultaneously perceived as potential privacy risks.They note that this underscores the variations in the emphasis placed on these attributes or prerequisites by different stakeholders.
This categorization provides a comprehensive overview of the security and privacy risks in MR that are presently recognized in research and actively addressed, conceivably also covering the concerns that users of MR systems might have in this regard.As a result, the aforementioned properties form two of the four principal categories within our framework.

Social Implications: Psychological Safety, Health, and Social Impact
Safety, specifically psychological safety, is another novel challenge in MR [27].In this context, the publication "The Ethics of Realism in Virtual and Augmented Reality" [69] identified eleven potential psychological and social implications that should be considered in the future development of MR.Given the extensive range of potential social impacts, achieving comprehensive coverage is unattainable.Yet, to consider a broad range of potential psychological and social concerns, we chose to integrate each implication as a subcategory under the respective factor.

Public Acceptance: Perception and Trust
Numerous factors can potentially shape the public's willingness to embrace novel technologies.The publication "Socio-psychological determinants of public acceptance of technologies: A review" [29] sought to explore the psychological factors that underlie the societal acceptance of emerging technologies and assembled a list of the most frequently employed determinants found in related research.
We opted for choosing a subset of these determinants that seemed fitting for the application regarding MR technology, especially considering the findings of Section 2.1.Herein, the primary emphasis centers on the perception of the technology rather than its actual properties and the level of trust in these systems.

SCALE FORMATION
After establishing a related-work-based initial conceptual framework for categorizing potential user concerns about MR, the subsequent phase involved developing a questionnaire that covers the genuine apprehensions of users.We followed a systematic procedure to accomplish this, as illustrated in Figure 1.This procedure is based on the scale development best practices proposed by Boateng et al. [3].This approach closely aligns with the methodology employed for developing the PCTS [78], which also aims to capture critical sentiments regarding novel technologies.

Item Generation
The initial items were generated by two researchers, creating four items for each subcategory of the conceptual framework, resulting in a total of 120 items.As the related work [19,29,69] gave definitions for each property/implication, we generated similar, albeit slightly different phrasing to allow for a more nuanced set of items in the end.Afterward, the authors discussed the items and revised items that sounded too similar.

Expert Feedback
Two rounds of expert feedback were carried out to reduce the substantial pool of initial items.In the first round, six experts were asked to give feedback on the initial set of items and indicate whether they considered each item essential for such a scale.The experts were researchers in the fields of privacy, security, VR/AR, and general HCI.The reduced set of items was chosen through majority voting, meaning that only if at least three experts indicated an item to be essential, it was retained, and all other items were discarded.The remaining items were then discussed and improved upon by the researchers based on the initial feedback of the experts.This resulted in a final set of 48 items.
Subsequently, another round of expert feedback was gathered for a final iteration, specifically regarding the phrasing to ensure that all items are easily comprehendable and sufficiently distinct.The Two of the experts had previous experience in developing questionnaires, while one expert, although knowledgeable about the process, had not previously engaged in questionnaire development.
To maintain balance and minimize potentially leading questions, half of the items in the final set were reversed.This was done to ensure that overwhelmingly negative phrasing would not skew responses, reducing bias where possible.

First Survey
After developing the reduced set of initial scale items, based on related work and expert feedback, a participant study was executed to refine the item set further through exploratory factor analysis.In accordance with the sample size recommendation by Comrey [13],  = 200 participants were recruited.

4.
3.1 Participants.Prolific3 was used to recruit participants, ensuring a more representative sample of subjects than through institute mailing lists or similar approaches.The participation was entirely voluntary, and the option to withdraw from the survey was available throughout.Participants were compensated with £1.50 upon completing the survey, corresponding to an average hourly reward of £15.15.The survey was conducted entirely online and took approximately 10 minutes to complete.The average age of participants was roughly 40 years ( x = 39.65, = 12.94), 50% identifying as male, 50% identifying as female, and all either currently residing in the United Kingdom or the United States.

Survey Structure.
To verify the robustness of our model in representing user concerns across various implementations of MR, four versions of the survey were created, with two versions introducing an AR prototype and two versions showing a VR prototype instead.Each version was shown to 25% of the participant pool, ensuring equal distribution.Furthermore, one prototype per technology was described to feature functionality that is usually linked to be rather concerning, while the other prototype was selected to showcase features that are typically associated with lower levels of concern.This was done to ensure the scale could consistently gauge concerns across a spectrum of intensities for various types of MR technologies.They were each described with neutrally phrased text of roughly 200 to 300 words and a mockup image of the interface/system.The participants were asked to state how much they agreed with each item of the reduced item set on a 5-item Likert scale (Strongly disagree, Disagree, Neutral, Agree, Strongly agree).All four prototypes were based on related work and already existing technologies.The non-concerning AR system was an intelligent navigation system, showing navigational clues via holograms and rerouting the user based on their preferences and current traffic information.This is based on already existing systems, implemented and tested in both research and industry environments [2,57].The concerning AR system was based on "FlirtAR" 4 and "ARR, matey!" 5 , describing a dating app that would show information about the conversation partner and conversational suggestions via AR.The non-concerning VR system featured a virtual vacation application, similar to a multitude of readily available VR apps 6 and related research [53,59].Lastly, the concerning VR prototype featured a gaming scenario, which would adapt the difficulty based on the player's emotions and physiological signals, porting the preexisting work of Chanel et al. [10] into a VR environment.

Exploratory Factor Analysis
Analogous to the development of other scales in the field of HCI [55,75,78], the extraction of latent factors was conducted as proposed by McCoach et al. [52].The results of the reversed items were inverted, and the Kaiser-Meyer-Olkin (KMO) criterion [68] was evaluated.With KMO values above 0.8 indicating satisfactory sampling adequacy and the result being KMO = 0.93 for the present dataset, we continued with the factor analysis.For this, the parallel factors technique [33] was used in conjunction with a Scree plot [9] to find the optimal number of principal axis factors.A varimax rotation was applied, as this orthogonal rotation method produces independent factors, aiming to allow the later reduction of items that load on multiple factors at once [79].Herein, the scree plot analysis indicated three factors to be the optimal solution for the items at hand.
To further reduce the set of items to achieve a concise scale that is practical for application in MR research, items with factor loadings below 0.40 were removed, as they are generally considered inadequate for such models [61].Items with significant cross-loadings were consequently removed as well.The final scale consists of 3 items per factor, leading to a number of 9 items in total.Cronbach's alphas, indicating the internal consistency of the (sub)scales, all show adequate consistency for the three factors, and the overall Cronbach's alpha of  = 0.85 for the scale as a whole confirms that suitable items were retained [15].The Cronbach's alphas of the subscale and the factor loadings of the items are all shown in Table 2.The model displays a good fit with KMO = 0.81, a Tucker Lewis Index [7] of TLI = 0.98 and a Root Mean Square Error of Approximation of RMSEA = 0.049.

Factor
Naming.As the first three items (SP1, SP2, SP3) are a combination of the two subcategories Security and Privacy, we opted to name the first factor Security & Privacy.This is in accordance with the work of De Guzman et al. [19], as introduced in Section 3.1, where the two factors were also combined into one contiguous list of properties.The items SI1, SI2, and SI3 all stem from the Social Implications subcategory of the conceptual framework, making the naming of the second factor trivial.Interestingly, the last three items (T1, T2, T3) all were reversed items.While their content in part correlates with the Security and Privacy categories, they also closely align with the last category, that being Public Acceptance, or to be more precise, Trust.As other properties of the Public Acceptance are not present in the final item set anymore and with the first factor already covering the potential concerns regarding both Security and Privacy, we decided to name this factor Trust.With this factor only consisting of reversed items, we hope also to reduce the latent negative bias that might stem from the critically phrased items of the preceding factors.

SCALE EVALUATION
With the three final factors of the scale being determined, the MRC Questionnaire could now be evaluated appropriately.This process followed the Phase 3 of scale development by Boateng et al. [3] and included two further surveys.

Second Survey
The first of the two surveys for evaluation was carried out to gather data for a confirmatory factor analysis, convergent/divergent validity, and differentiation by known groups.

Participants.
As with the first survey, see Section 4.3, we again chose to use Prolific as the recruitment platform for this survey.Similarly, the participation was entirely voluntary, and the option to withdraw from the survey was available throughout.Participants were compensated with £1 upon completion of the survey, which corresponds to an average hourly reward of £13.62.In total,  = 100 participants were recruited for this survey.It was conducted entirely online and took approximately 5 minutes to complete.The average age of participants was again roughly 40 years ( x = 39.83, = 12.05), 50% identifying as male, 50% identifying as female, and all either currently residing in the United Kingdom or the United States.
5.1.2Survey Structure.Participants were shown one of two prototypes for assessment, one again being expected to yield comparatively few concerns and one raising potentially more concerns.Each was depicted using neutral-worded descriptions, spanning approximately 200 to 300 words, along with a mockup image of the interface/system.Each participant was randomly assigned one of the two prototypes, and both were shown with equal frequency.
Both systems offered the same fundamental feature set, namely an AR application offering contextual information for tourists in cities unfamiliar to them.This included the navigation to relevant points of interest through holograms that blend into the environment for unobtrusive clues, offering an adaptive AR experience.The second prototype introduced an additional feature, specifically blocking the view of parts of reality based on user preference.The hypothetical adaptive AR base system is based on related work [16,39], and the added view filter has been discussed in recent publications [21] and expert interviews 7 as well.
After an introduction to the prototype at hand, participants were instructed to state their agreement with each of the items of the final MRC Questionnaire as shown in Table 2. Additionally, they were asked to complete both the PCTS [78] and the UEQ [66] for the shown prototype to facilitate convergent/divergent validity tests.

Confirmatory Factor Analysis
To evaluate the structural validity of the scale, we performed a Confirmatory Factor Analysis (CFA).Herein, the dimensionality of the model can be verified through systematic fit assessments, confirming the structure of the model if certain thresholds are met [3].With a Tucker Lewis Index of TLI = 0.98, a Comparative Fit Index of CFI = 0.99, and a Root Mean Square Error of Approximation of RMSEA = 0.059 the results are indicative of an internally consistent model with a fair to close fit.As seen in Figure 2, the subscales exhibit a moderate to high correlation, implying that the theoretical scale is reasonable.The Cronbach's alphas for the three subscales are  = 0.92 for Security & Privacy,  = 0.85 for Social Implications, and  = 0.79 for Trust, respectively.

Construct Validity
As the two prototypes for the second survey were consciously chosen to differ in the number of concerns raised through having the same set of basic features, but the second one added a reallife filter that is already critically discussed in current literature, a differentiation by known groups is possible.Afterward, the MRC Questionnaire is compared to existing scales to evaluate if and how different concepts correlate with the proposed model.

Differentiation by known groups.
The two prototypes for the second survey were intentionally selected to raise varying levels of concerns by sharing the same fundamental features, with the second introducing additional functionalities that have already been the subject of critical discussion in current literature.A differentiation by known groups can be performed on the assumption that the second prototype will cause significantly more concern among the participants.This approach was first proposed by Churchill et al. [11] and was analogously in previous scale development processes [55,78].The results of the second survey, divided into the two prototypes and analyzed separately, prove this assumption to be correct.After assessing that a normal distribution could be assumed with a Shapiro-Wilk test (W = 0.99,  = 0.34) and that homogeneity of variances is given with Levene's test (L(1, 96) = 1.38,  = 0.24), an independent t-test ( (96) = −3.36, = 0.001) revealed that the resulting score of the MRC Questionnaire for the first scenario ( xMRC = 29.1, MRC = 6.96) was significantly lower than for the second scenario ( xMRC = 33.6, MRC = 6.3).Table 3 shows the full results of this step.

Convergent/Divergent Validity.
To compare the results from the MRC Questionnaire with established questionnaires, participants evaluated the presented hypothetical prototypes using not only the MRC but also the PCTS [78] and the UEQ [66].
As the PCTS is one of the only scales that explicitly sets out to measure negative sentiments towards technologies, a high correlation between the MRC Questionnaire and it is desired.The PCTS assesses the perceived creepiness of a technology in regards to the three factors Implied Malice, Undesirability, and Unpredicability.One might assume that when individuals perceive a technology as having potential security or privacy vulnerabilities, they may consider it undesirable.The presence of security and privacy concerns might undermine the technology's trustworthiness, potentially making it less predictable in turn.Furthermore, when users perceive a technology as having social implications that may disrupt or harm societal norms, they may interpret these consequences as indicative of implied malice.To the best of the authors' knowledge, there is currently no other questionnaire specifically designed to evaluate negative sentiments toward emerging technologies directly.As Figure 3 shows, the MRC and PCTS correlate ( = 0.58, 95% CI = 0.43, 0.70), indicating that the perceived feeling of creepiness evoked by an MR system and the magnitude of concerns raised in relation to it are both impacted similarly.While a simple correlation test cannot prove the above-mentioned hypothesized causations, the scales do correlate as expected.While we assume that the PCTS assesses the feelings (i.e., invoked creepiness) that are a reaction to the system's concerns, and with this its inherent properties, further research is needed to prove this connection.
We incorporated the UEQ [66] for another comparative assessment.The comparison with the UEQ is particularly valuable due to its widespread use and established reputation as a comprehensive tool for assessing overall user experience, encompassing classical usability aspects as well as user experience dimensions.Among the available questionnaires, the UEQ was chosen for its versatility and applicability across various technological contexts, providing a well-established benchmark against which the effectiveness and While factors like efficiency or perspicuity can be hard to assess through a text description and a mockup image only, we specifically focused on the two hedonic qualities, those being stimulation and novelty.Our interest in these hedonic qualities arises from the hypothesis that when a new device is perceived as subpar or unneeded, users may harbor more concerns.Conversely, when a new system is viewed as exceedingly novel and futuristic, concerns may stem more from unfamiliarity than from actual substantive concerns regarding the device.However, the test results reveal that both stimulation ( = 0.22, 95% CI = 0.02, 0.39) and novelty ( = 0.17, 95% CI = −0.03,0.35) exhibit a low correlation with the MRC, suggesting that concerns related to MR systems encompass more than just stimulation and novelty.For completeness sake, all UEQ scales are shown in Figure 3.

Third Survey
In addition to the Cronbach's alphas reported in Section 5.2 as tests of reliability, we opted for performing one further test-retest reliability evaluation by conducting a final third survey.
5.4.1 Participants.Instead of using Prolific for recruitment, as in the first two surveys, participants were invited to take part through institute mailing lists and snowball sampling.In the end, a total of  = 12 people participated in the online survey, which took approximately 5 minutes to complete.Again, the participation was entirely voluntary, and the option to withdraw from the survey was available throughout.No compensation was given for the third survey.The average age of participants was roughly 27 years ( x = 27.25, = 4.0), with two-thirds ( = 8) identifying as female and the rest identifying as male and all currently residing in countries of the European Union.As noted by Mejia and Yarosh [55], while it often poses difficulty to recruit enough people for two survey runs, and this usually being the reason why a test-retest evaluation is omitted, we too opted for still performing this validation, even if only a smaller sample size could be achieved.The time between the two runs was set to be at least ten days to ensure a long enough time between the two reflections on the presented prototype.It is evident that the MRC and PCTS [78] highly correlate.Furthermore, the MRC correlates with both the Attractiveness and Dependability subscales of the UEQ [66].
publicly.This concept was based on related work [32,37] and a now-defunct social media platform with a similar set of features8 .
The MR system was described in a 260-word description, and a mockup of a potential interface for such an application was supplied.Afterward, participants were instructed to state their agreement with each of the items of the final MRC Questionnaire as shown in Table 2.

Test-Retest Reliability
As suggested by Rousson et al. [64], we evaluated the Pearson product-moment correlations for both the subscales and the MRC Questionnaire as a whole.While the Security & Privacy only showed an acceptable correlation for a test-retest context [12], the two other subscales showed much higher correlations.In total, the MRC Questionnaire exhibits a moderate to excellent test-retest reliability ( = 0.85, 95% CI = 0.54, 0.96).The correlation plots and respective correlation values are shown in Figure 4. Based on this reliability test, especially considering the small sample size, it can be assumed that the MRC Questionnaire shows temporal stability and can be used in repeated-measures studies.

DISCUSSION
In this section, we present instructions on using the MRC Questionnaire and interpreting its results.Furthermore, we explain the limitations of our approach and the scale as well as ideas for future enhancements.

Scoring
The MRC Questionnaire is scored on a 5-point Likert scale, ranging from Strongly disagree (1) to Strongly agree (5).All items of the Trust subscale are reverse-coded.As a result, the scale's range spans from 9 as the lowest score to 45 as the highest.Elevated scores signify higher concerns associated with the MR system.

Guidelines and Limitations to Administering the Scale
A measuring instrument, such as the presented MRC Questionnaire, which is designed to assess concerns related to MR systems, can be immensely valuable for the research, development, and improvement of these technologies.Such an instrument might serve as a crucial tool in several ways: This scale is intentionally designed not to assess the specific, objective problems or risks associated with a technology but rather to focus on user apprehensions and concerns.Its primary purpose is to measure the subjective perceptions and feelings of users regarding a technology, particularly any unease or worries they may experience.By concentrating on user apprehensions, the scale aims to capture the emotional and psychological aspects of how MR systems might be perceived even before actual user experiences can be gathered.It recognizes that people's perceptions and concerns can vary widely, even when faced with similar objective risks or issues.Therefore, the scale provides a means to gauge how users interpret and respond to these risks on a personal level.
Conversely, it can also be used to assess actual implementations.Users' apprehensions often reveal pain points or areas of discomfort about the technology at hand.This information is valuable for pinpointing specific issues that may need addressing, whether they relate to security, privacy, social implications, or the inherent trust in the system.User concerns can also guide the development of educational materials or resources to help users understand the technology better.Addressing misconceptions or alleviating fears through education can contribute to a more positive user experience.In summary, while the scale's primary focus is on assessing user apprehensions and perceptions, it can serve as a versatile tool for evaluating new parts of the user experience in actual technology implementations, which other scales currently do not assess.By understanding and addressing user concerns, developers can enhance the overall quality and acceptance of MR systems and other technologies.
The preceding evaluation suggests that applying the MRC Questionnaire is suitable for both between-subject and within-subject studies, as well as for repeated-measures studies.Although the analysis of the subscales generally presents favorable results for evaluating them on their own, we do not explicitly recommend this application.The intentional brevity of the scale serves the purpose of offering a quick initial insight into potential user concerns.However, the precise nature of these concerns should be explored through additional qualitative research and is likely to be highly specific to the particular MR system under consideration.As illustrated by the conceptual model in Section 3, the realm of potential reasons for concern is too expansive to encompass within a single scale suitable for a wide range of applications.Once again, this scale is designed primarily to provide an initial understanding of potential user concerns.
Finally, it is crucial to emphasize that this scale is not inherently linked to the acceptability of a system.Although we assume that the absence of concerns can certainly impact acceptability, numerous other factors may come into play.For this, other scales and questionnaires, like the ones presented in Section 2.3 and Section 5.3.2, should be used in conjunction with the MRC Questionnaire.

Limitations of the Development Process
Next to the aforementioned limitations to how the scale can be used and evaluated, we acknowledge that the development process of the MRC Questionnaire may be subject to certain limitations, too.First and foremost, the exploratory factor analysis, as well as all subsequent evaluation stages, was conducted during a period when MR technologies were gradually making their way toward broader public acceptance.The trajectory of development and widespread adoption of these devices in the coming years remains uncertain.Consequently, it is likely that opinions, perceptions, and concerns will change over time.Therefore, a reevaluation of the scale may become necessary in the future.
Much like the PCTS [78], we opted to concentrate on developing a scale that evaluates users' concerns and apprehensions immediately after the first introduction to that MR system.Due to this, the suitability of the MRC for long-term studies remains uncertain.While we expect that the scale might have the potential to measure how user concerns change over time, it is essential to note that this capability cannot be definitively affirmed at the time being.
Additionally, the study primarily involved participants from countries with a Western cultural background, and as the surveys were conducted online, all participants possessed at least a basic understanding of current consumer electronics.While we hope for the scale to have relevance in diverse cultural contexts and among individuals with varying levels of familiarity with consumer electronics, we cannot guarantee this outcome.Ideally, future research will address this issue, facilitating cross-cultural and demographic comparisons of different concerns and apprehensions that people might have regarding MR systems.
The lack of real exposure testing introduces uncertainty of external factors (e.g., user context [65] or situatively perceived cognitive workload [48] during MR use), regarding the questionnaire's performance for capturing concerns when interacting with MR systems.The potential biases or deviations in user responses under actual MR exposure conditions raise consideration since they could impact the questionnaire's reliability and validity in such contexts (cf.[4,46,74] for biased study data when users have specific expectations towards novel technologies).To address this limitation, future research should prioritize conducting evaluations with participants exposed to operational MR systems using the MRC questionnaire.This approach will provide a more comprehensive understanding of the MRC questionnaire's effectiveness in capturing user experiences.Additionally, incorporating user feedback from authentic MR interactions will contribute to refining the questionnaire for increased applicability and relevance in practical settings.

CONCLUSION
We present a measurement tool designed to evaluate user concerns and apprehensions regarding MR systems.Initially, we constructed a conceptual model outlining potential concerns associated with MR systems, drawing insights from existing research.Subsequently, we engaged in two rounds of expert feedback to generate a comprehensive set of survey items.A total of three surveys were conducted to first reduce this set of items and then evaluate the final MRC Questionnaire.
The questionnaire shows high internal consistency, adequate temporal stability, and high convergent and divergent validity.It serves as a valuable instrument for assessing the initial concerns individuals may harbor when encountering a new MR system.Furthermore, its intentional brevity enables its application in various studies and situations where an initial understanding of potential apprehensions is required.
We aspire for this scale to help researchers and developers cultivate a constructive approach to these concerns.It can serve as a tool to ensure that new MR artifacts and applications transparently convey their intentions, features, and potential impact on both users and bystanders.While this assessment could prove beneficial for educational purposes, it is essential to emphasize that addressing potential concerns primarily falls within the realm of technological development rather than solely relying on user adaptation or adjustment.
The questionnaire and supplementary material are openly accessible on the research group's website 9 .

Figure 1 :
Figure 1: The process of developing the scale, as this paper outlines.

5. 4 . 2
Survey Structure.Participants were shown one hypothetical prototype, for which, based on the explained feature set, relatively high values were to be expected.It consisted of an AR social application that enabled users to receive automatic information about their conversation partners through facial recognition.Additionally, it provided the functionality to rate individuals and conversations

Figure 3 :
Figure3: The main diagonal shows the histograms of each metric; the lower triangular shows the correlation plots between the metrics, and the upper triangular shows the corresponding r-values for the different scales under comparison.It is evident that the MRC and PCTS[78] highly correlate.Furthermore, the MRC correlates with both the Attractiveness and Dependability subscales of the UEQ[66].

Figure 4 :
Figure 4: The different subscale and overall scores for both runs of the third survey.Furthermore, the Pearson product-moment correlation is given for all plots.

Table 1 :
The preliminary conceptual framework with its four categories and their respective subcategories aiming to classify potential user concerns regarding MR systems.This model will be used to develop the scale in the following.

Table 2 :
The final MRC Questionnaire, comprised of three factors with three items each.Also reported are Cronbach's alphas and factor loadings based on the first survey results.

Table 3 :
Figure 2: The result of the CFA confirms this three-factor model for the scale, with moderate correlations between the subscales and mostly high item coefficients.Differentiation by known groups.