Connecting Home: Human-Centric Setup Automation in the Augmented Smart Home

Controlling smart homes via vendor-specific apps on smartphones is cumbersome. Augmented Reality (AR) offers a promising alternative by enabling direct interactions with Internet of Things (IoT) devices. However, using AR for smart home control requires knowledge of each device’s 3D position. In this paper, we introduce and evaluate three concepts for identifying IoT device positions with varying degrees of automation. Our mixed-methods laboratory study with 28 participants revealed that, despite being recognized as the most efficient option, the majority of participants opted against a fast, fully automated detection, favoring a balance between efficiency and perceived autonomy and control. We link this decision to psychological needs grounded in self-determination theory and discuss the strengths and weaknesses of each alternative, motivating a user-adaptive solution. Additionally, we observed a “wow-effect” in response to AR interaction for smart homes, suggesting potential benefits of a human-centric approach to the smart home of the future.


INTRODUCTION
Smart Home interactions currently come with multiple hurtful User Experience (UX) challenges as users rely mostly on smartphones for interacting with connected household devices [39].This forces users into long journeys: fnd the phone, unlock it, fnd the vendor-specifc application, locate the target device using the vendor-specifc graphical user interface (GUI), fnd the desired functionality within those supported by the device, and fnally trigger the action.Smart light switches that often have several buttons to control light temperature, color, intensity, or custom actions simplify this lengthy process at the expense of usability: users must memorize (sometimes quite complex) button combinations to control their home.
One particularly cumbersome setup aspect is locating individual devices and accessing their controls.A possible way to overcome this problem is through the display of user interfaces in visual proximity to the target device via Augmented Reality (AR) 1 .With current developments in the consumer market (e.g.Apple's launch of the Vision Pro) hinting at the long-promised consumer-marketgrade maturity of AR technologies, AR applications are approaching large-scale deployment, particularly in domestic spaces.Consequentially, investigating AR solutions to Smart Home problems seems a rather promising approach, since AR could provide several UX benefts, such as on-the-fy interactions and more natural and intuitive interaction designs.Further, AR not only removes the spatial dissociation between the target device and its user interface, but it also simplifes the user journey immensely, by ofering larger areas for displaying visual contents and interactive elements.And importantly, an AR interface would strongly reduce reliance on smartphones, which have been increasingly considered a negative presence in households [26,35,40].
Despite the advantages and comfort ofered by AR-based Smart Home interaction, this feld of application is still at an early stage of development.Assembling a Smart Home requires setting up many products, often from diferent manufacturers and with diverse characteristics.Many vendors alleviate the installation process through "plug and play" products, which are confgured into the Smart Home system with varying degrees of automation (e.g., Amazon's Frustration-Free Setup 2 ).However, the confguration of AR elements in Smart Home applications is typically done by hand.This is particularly inconvenient in terms of matching the positions of physical devices to coordinates in the spatial models of AR frameworks.Multiple technical solutions have been proposed to simplify this task, such as the usage of visual markers and QR codes, or indoor location mechanisms.These solutions present drawbacks in terms of product design and production costs, as well as a compromise in practicality (e.g.QR codes must be scanned individually).
In this paper, we propose a technique to solve the problem of device localization.We take advantage of the sensors available on AR devices, and the actuators present on connected household appliances.By making appliances blink, buzz, or call for attention the best way they can, we enable AR devices to identify them individually and calculate their physical coordinates in the real world.Our solution is manufacturer-independent, allows for a high level of automation, and requires no additional hardware.It can be retroactively applied to many legacy devices and requires no signifcant costs of implementation for future designs.
In designing a feasible setup method, it is paramount to consider the experience that the users have during the confguration of Smart Homes.This initial contact with Smart Home technology can have a conditioning efect on long-term subjective perception of interactions and, in extreme cases, can result in discouraging levels of frustration.To that end, recent research on positive computing [36] emphasizes the importance of looking beyond classic usability factors like ease of use, especially for the interaction with pervasive technologies that accompany people in their lives.Thereby, the innate psychological needs for autonomy, competence, and relatedness ought to be recognized in the design of technology interactions like AR Smart Home setups.This recognition could ensure that users experience a lasting, positive connection to their Smart Home, established right from the start [2].However, as this idea has not yet been pursued with Smart Home interactions, it is paramount to explore diferent interaction designs and gain a better understanding of their impacts on users' needs and preferences.
As a starting point for the design spectrum, we herein mainly considered the degree of setup automation, as we expected substantial diferences in how this dimension could afect psychological needs.Initially, we considered that a manual setup could be the one that maximally fulflls these needs, as it provides full control over the setup process (providing a high degree of autonomy), could instill a sense of mastery by completing the setup actively (providing a high degree of competence), and could create a sense of connection to the system through this engagement (providing a high degree of relatedness) -a sensation also known as the IKEA efect [32].However, we also expect that a fully manual setup could easily become demanding and frustrating, especially when the number of smart devices that need to be set up increases, making us aware of likely trade-ofs between the recognition of psychological needs and classical usability dimensions [44].In contrast, a fully automated setup might be easier and more convenient, yet might move the user "out of the loop".Thereby, a fully automated setup could also be experienced as alienating and disconnecting.Therefore, we considered how to possibly mitigate these trade-ofs and achieve 2 https://developer.amazon.com/frustration-free-setup(Accessed: 02/19/2024) an efective balance between psychological needs recognition and usability through a semi-automatic setup process, leading us to a fnal set of three interaction design variants.Altogether, these design considerations gave rise to these research questions, aiming to explore emerging experience trade-ofs: • RQ1: Does a manual Smart Home spatial setup design maximize psychological need recognition, and do classical usability dimensions undermine the beneft of this characteristic?• RQ2: Does a fully automated spatial setup maximize classical UX dimensions like ease of use, mental workload, and frustration, but reduce the attractiveness of the interaction design by thwarting psychological needs?• RQ3: Does a combination of manual and automated features strike an efective balance between psychological needs recognition and classical UX dimensions, efectively enhancing technology acceptance?
We pursue answers to these questions through a mixed-method study that includes a prototypical implementation of the device localization system for the manual setup scenario, and a Wizardof-Oz study for the (semi-) automatic scenarios.By recruiting a diverse sample (age 19-64, 54% female, mixed residence types and experience with Smart Homes and AR) and conducting the study in a state-of-the-art Smart Home lab, we enable an experience of how this AR Smart Home setup would be experienced in a real-world setting in the future.
Our work contributes to the Human-Computer Interaction (HCI) community in the following ways: • We provide a system to support and facilitate device localization during AR confguration of Smart Homes.The code for the project is provided with the article.• Overall, we fnd that users report high levels of engagement with this AR-based Smart Home interaction, highlighting the approach as a promising design option for future work in the HCI domain.• Furthermore, through the combination of quantitative and qualitative evidence, we provide a comprehensive and indepth account of users' experiences, highlighting that indeed, the combination of control and automation provided a good mixture of need recognition and usability, indicating high levels of technology adoption.• At the same time, we also fnd substantial experience contrasts, for example that some participants do report a strong sense of connection to the Smart Home environment whereas others remain indiferent about it.Paired with the observation of diferent preferences for the degree of active interaction, we outline implications and design recommendations for following work.

RELATED WORK
Smart Homes are characterized by connecting several devices, automation features, and remote control [23] with goals like helping users or increasing hedonic value, e.g., through aesthetic home improvements [22].The Smart Home extends to numerous device categories such as lights, speakers, thermostats, blinds, household appliances, sensors, and more [41].Users can choose between individual devices for selected functions, ecosystems from specifc vendors or consortiums such as Home Connect3 usually using a common Smart Home hub, or integrator solutions such as Home Assistant 4 that combine fragmented ecosystems [21].In recent years, Smart Home research has evolved from engineering disciplines to several other felds, such as HCI, and inspires interdisciplinary research [61].

Smart Home Research in HCI
Yao et al. [61] identify fve trends in Smart Home research within the HCI community: interaction design, user behavior, smart devices, design exploration, and data, privacy, and security.Data privacy and security is currently the most prominent research stream [61].While Smart Home devices connected to the internet pose various privacy and security risks [1], users generally trust Internet of Things (IoT) devices and manufacturers [62].
Regarding user behavior, Wozniak et al. [57] observed distinct roles from passive users to active users, and administrators.Administrators face a trade-of between professionally installed, wellintegrated, and pre-confgured systems and more fexible, adaptive, cheaper retro-ft systems that usually require more efort for device selection, setup, and confguration [21].While tasks like selection, setup, and confguration are usually carried out by interested users from the administrator role, they still face signifcant challenges and have to build up knowledge for their Smart Home [21].Household members who only use the system typically rely on several vendor-specifc Smart Home apps on smartphones or wall-mounted tablets [49], voice-based assistants such as Amazon Alexa [13], buttons on the devices themselves, or remote controls to interact with Smart Home devices.Still, they often require training and need to remember button combinations, voice commands, app layouts, and the afordances of smart devices in general.Thus, researchers are demanding more natural interactions [61].

AR, Indoor Positioning & The Smart Home
Integrating AR technologies into Smart Homes is a promising area of application, which has seen many eforts in diverse areas like elderly care [3], energy management [63], and nutrition support [28].
A subset of this work aims to provide insights and recommendations for AR integration with Smart Homes in general terms.Mahroo, Greci and Sacco propose a framework for AR-based interaction with Smart Homes and their components [30].Their work focuses on the defning features of this application, namely the spatial aspects, such as the alignment of mixed elements, and the interconnection of the components.Jo and Kim delve further into the technical aspects, identifying the main components to achieve synergetic integration [24].
Devices are usually assigned an area (e.g., a room within the house), and can be grouped for it (e.g., turning on all lights in a room at once).Thus, the exact location of each device is not known to the system, but also not required in a traditional setup.However, the three-dimensional position of a device is necessary for advanced use cases.Especially for applications that connect AR glasses to the Smart Home, the precise location of the devices is required for an unmediated, natural interaction.There are numerous technologies for indoor localization ranging from radio-frequency-based approaches to inertial sensors, ultrasound, and visible light communication [7,25].Ultra-wide-band systems can precisely track beacons placed on IoT devices [31] and visible light communication can track devices without congesting radio-frequency bands [53].
Yet, AR devices can locate themselves within a 3D coordinate system without the need for additional external devices [12], thus, enabling automations in the Smart Home that further reduce the number of required interactions and determine the relative localization of other devices [45].For instance, "Smart ARbnb" [16] provides transparency of device capabilities and automations for guest users by detecting light patterns of small LEDs next to each smart device with their smartphone camera.Similarly, several papers discuss efective locators for use cases such as spatial automation creation [19,47], privacy awareness [38,51], or providing contextsensitive, relevant information [17,48].Leveraging the spatial aspect of AR, Wu et al. developed Megereality, a model for gestural interaction using multiple devices in AR [58].Their work attempts to break the barrier between the physical and digital realms by using metaphors and embodying abstract processes.
Presently, this existing work focuses on running systems.Thereby, installation, confguration, matching, and integration of AR components with their physical counterparts is performed by an administrator and rarely discussed.However, the setup is a critical aspect of Smart Home popularity and, although it is likely done just once, it can have a signifcant detrimental UX efect [21].The challenge of confguring spatial Smart Home settings in AR was considered by van der Vlist et.al in their work on semantic connections [55].This concept attempts to facilitate a better user understanding of their Smart Home confguration using visible lines and symbols displayed with a small projector.Another approach allows users to set individual privacy settings by pointing an AR device towards any IoT device during setup [8].Lyu et al. [29] created HomeView to automatically derive a digital twin of Smart Homes based on AR captures, reducing the need for continuous manual reconfguration of device positions.
From the literature, it is clear that breaking the division between the real world and the spatial model is critical, yet challenging.This duality becomes particularly relevant for AR applications in Smart Homes since it is key to enabling the kind of interaction that can truly beneft the user.Thus, solving the problem of matching spatial coordinates with Smart Home devices presents an opportunity for a valuable contribution to both the AR and Smart Home communities.Furthermore, as the device setup is the entry point for many Smart Home experiences, anticipating the UX impacts of interaction designs is vital for an efective innovation at this intersection of AR and Smart Homes.

Self-Determination Theory
We hypothesize, that a Smart Home setup process must satisfy the homeowners' psychological needs to enable a lasting positive UX and adoption.Self-determination theory (SDT), initially proposed by Deci and Ryan [42], posits that a positive life experience is fundamentally rooted in the fulfllment of psychological needs.Central to SDT is the idea that individuals have innate psychological needs, and the satisfaction of these needs can foster optimal growth and well-being [42].These needs are: (1) Autonomy: The sense of volition and being the origin of one's behavior.(2) Competence: The feeling of efectiveness in one's actions.
(3) Relatedness: The feeling of connection and belonging with others.
While the theory has been extensively applied and confrmed in the education [59] and work domains [15,33], HCI scholars too have found it to be a useful vehicle for the design and evaluation of positive user experiences, especially in games [6,43,54], but also in general as an extension to classical UX considerations [36].Understanding and incorporating these psychological needs can signifcantly infuence user experience.For instance, a system or interface that supports a user's sense of competence can enhance engagement, satisfaction, and persistence in interaction.Likewise, providing users with choices (supporting autonomy) and fostering a sense of community or connection (supporting relatedness) can further enhance user engagement and satisfaction [36].
In some more specifc instances, previous HCI work has explicitly investigated how psychological needs recognition can improve the design of interactions with intelligent technologies like chatbots [60], robots [27], and recommendation agents [11], showing that the recognition of psychological needs creates higher engagement, deeper interaction, and longer-lasting acceptance of such intelligent systems.
While the approaches to need fulfllment in interaction design difer somewhat from application to application, there appears to be a certain consensus, that autonomy can be fostered by providing control, for example by allowing customization and meaningful choices whenever possible so that users feel they have a say in how they interact with the technology [36,60].For competence support, it is recommended that interactions enable gradual skill development and provide positive feedback and reinforcement for completing tasks successfully to enhance users' feelings of mastering a particular task [36,54].Relatedness is, on the one hand, primarily fostered by incorporating social elements into the interaction design that enable interaction with others, such as social media integration, collaboration features, or community forums, to create a sense of connection with other users [36,54].On the other hand, relatedness is also considered as a connection to the technology, which can be enhanced by tailoring the system to the individuals' preferences.This personal touch supposedly enhances the sense of connection between the user and the technology [36,60].
Besides these previous works, psychological needs have not yet been considered in the context of Smart Home technologies.However, we argue, that this is a vital application domain as it is known that thwarting psychological needs reduces general wellbeing [15,33], we argue that the interaction that individuals have with the technologies in their own homes must be designed to support these needs due to the pervasiveness of the interaction in everyday life.Furthermore, we argue that the recognition of these needs will have an important infuence at the very early stages of a Smart Home interaction.In a sense, frst interactions with a Smart Home should leave a pleasant impression to elicit positive spillover efects for following everyday interactions.

APPLICATION AND EXPERIMENTAL SETUP
To develop an efective solution for AR-based Smart Home setups, we created the Prototypical Augmented Reality Confguration System (PARCS), a system capable of determining smart device positions.The PARCS is manufacturer-independent and works under the assumption of a working Smart Home setup without any initial knowledge about the position of any device.PARCS combines the actuators present in Smart Home appliances with the sensing capabilities commonly provided by Head Mounted Displays (HMD).Each Smart Home device provides a distinctive signal by e.g., switching LED power indicators on and of, emitting specifc sounds, or visually distinctive movements, thus allowing cameras and microphones integrated into an HMD being able to pick up those cues and calculate their position.

The Prototype
For this experiment, we implemented the PARCS based on a Microsoft HoloLens 2 (v2020.3.34f).We used Unity as the main development environment, with Microsoft's Mixed Reality Toolkit (MRTK v2.8.35 ) as the supporting framework.As a proof-of-concept, we implemented the functionality to support the detection of smart lightbulbs (Philips Hue E27) using computer vision (OpenCV v4.7.06 ).The Smart Home hub itself consists of a Raspberry Pi 3 running Home Assistant (v2023.5.3).The popular open-source project Home Assistant ofers several thousand integrations, including 141 smart light ecosystems [5].
As a use case, we implemented the positioning of smart lights within an already confgured Smart Home environment without knowledge about specifc device positions.Smart lights were our primary choice as they usually occur several times in a Smart Home, give immediate visual feedback to users, and were the most natural device category to build a camera-based position estimation prototype for due to distinct visual characteristics (blinking) and simple, unifed APIs.The HMD connects to the Smart Home hub and sends commands to the individual Smart Home devices via the Home Assistant REST API.To detect an individual device, the smart light is turned on and of repeatedly.This approach is manufacturerindependent, as the Smart Home hub abstracts and exposes each smart light as a light entity with a fxed feature set.The "turn on" and "turn of" commands are available for all smart lights by defnition.
The AR application queries the most recently triggered motion sensor, if available, to determine and suggest the area that the user is currently in.Otherwise, the user can select the respective area or room manually.Then, a list of all smart lights in the area is retrieved and turned of.Using the front-facing RGB camera of the HMD, the contour of bright surfaces or refections is detected using a technique adapted from Suzuki et al. 's work [52].Once the planar coordinates of the camera's image are calculated, these are projected on the 3D mesh generated by the HoloLens' depth camera, determining the coordinates of the bright spot.The application marks these spots to ignore and avoid false positives later on.Next, the frst smart light is turned on and of repeatedly for detection and the user is asked to look towards the device.After each "turn on" command, the application considers the 3D position of each new bright spot as a potential candidate for the device and removes bright spots that remain after turning the device of again.Hence, if only one candidate remains consistently, the process terminates, stores the position of the device, and continues with the next one.This approach is executed locally and in real-time on the HoloLens without any perceivable detriment to the HMD's frame rate.Images are captured at 15 frames per second, and each image is analyzed within 4 frames of the application's update loop (≤ 67).Depending on the time the user requires to look towards the fashing device, the process can take less than 5 seconds per device.
The general design of the interaction was created following the HoloLens 2 guidelines from the ofcial MRTK documentation 7 .By these recommendations, interaction with near elements and hand menus was controlled using fnger-pointing.The positioning of the spheres to mark the spatial coordinates of the Smart Home devices was based on the go-go interaction technique, to reach distant locations and minimize the required movements [37].We abstained from further embellishments to minimize external factors in the behavior observed during the study.
The source code 8 of the implementation and a depiction of the process at the end of the accompanying video are made available with the article.

Three Interaction Design Variants
Beyond the light detection feature, we adopted the Wizard-of-Oz technique to both focus our research on the user interaction experiences and also to extend the PARCS' feature set.Specifcally, we simulated a perfectly functional application that could allow the user to control 50 smart lights and 4 smart speakers.To gain insights into the potential trade-ofs between psychological needs and classical UX dimensions, we developed an experiment comparing degrees of system automation, as we expect this dimension to substantially impact psychological need fulfllment (see RQ1-3 in the Introduction).We designed three variants (see Figure 2): 3.2.1 Manual.The manual setup is proposed as the more needrecognizing condition for the interaction and lacks intelligent support.The user interacts with one device at a time (e.g.lights and audio devices).We used audio devices in addition to smart lights to stimulate another sense as contrast.Devices attract the attention of the user through their feature sets (e.g.lights turning on/of, audio devices playing sounds).The user then positions a virtual sphere on the device, which functions both as an anchor for the Smart Home system and a visual interface for the user.Spheres are initially positioned in abundance on the foor and can be chosen indistinctly, to avoid the spawning and search of new spheres.Once the users are satisfed with the position of the sphere, they open a hand menu by making a gesture to confrm the positioning and cue the system to move on to the next device.This process is repeated for each of the available devices.It is important to note that the coordinates of the device are obtained from the user's manual positioning of the sphere.We expected this interaction variant to best fulfll the psychological needs by ofering complete control over the process (ensuring autonomy) and fostering a sense of mastery and engagement (related to competence and relatedness through active engagement with the system).However, we also expect potential challenges with a fully manual approach, particularly as the number of devices increases, prompting us to consider trade-ofs between psychological needs and traditional usability dimensions.

Automatic.
In contrast to the manual design, the fully automatic variant reduces the users' involvement to the minimum.In this condition, the recognition of all devices is parallelized, and all devices emit their signals simultaneously.While the sensors available on the HoloLens 2 make this variant technically feasible, the efort to develop such a system surpasses the scope of our work.Thus, to provide this functionality, we resorted to the Wizard-of-Oz technique and simulated the automated location of devices.This is achieved by actuating all the Smart Home devices simultaneously for 25 seconds.After that time, all devices are turned back to their idle states and all interaction spheres are shown in their correct (pre-recorded) positions.We expect that a fully automated setup might be easier and more convenient (higher classical UX), yet might move the user "out of the loop".Thereby, a fully automated setup could also be experienced as of-putting and disconnecting, thwarting psychological needs.

Semi-Automatic.
Finally, the semi-automatic interaction can be seen as an assisted approach that could bridge the UX/needs trade-of discussed for the previous two design variants.To achieve this efect, we designed the semi-automatic interaction to feature control and automation on demand.
Similarly, as in the manual condition, devices connected to the Smart Home are confgured sequentially, one at a time.Each device is actuated individually until the users fx their head gaze towards the device for at least 2 seconds.The successful spatial setup of the device is indicated by the appearance of a control sphere on the device and a short sound signal.
After confguring the device, users are prompted to choose between continuing the confguration for each single device, or setting up all devices from the same category (e.g., lights or audio devices) simultaneously.After the user confrms their position, the device is automatically recognized and its position is calculated and recorded.If the user chooses the second option, all devices of the category are actuated simultaneously (e.g.all lights blink), and the user confgures each of them by fxing their gaze in the direction of the devices.Once it is confgured, each device stops immediately emitting signals, thus allowing the user to choose a diferent device from the remaining ones.
Independently of this choice, the semi-automation of the PARCS is limited to calculating the position of the device, while the rest of the process is still controlled by the users.
For our experiment, the position of devices is already known to the Wizard-of-Oz system.This signifcantly simplifes the recognition process by limiting users' gaze tracking and reaction when it hovers over the invisible target for the goal device for more than two seconds.

The Smart Home Environment
The experiment was conducted at our lab (to ensure anonymity, we exclude distinctive details from this manuscript.A thorough description of the infrastructure would be added in a camera-ready version).
The used space is a dedicated room with a surface of 74 2 (around 800 ), fully dedicated to the purpose of replicating a real Smart Home environment.The interior design resembles a modern open apartment with a fully functional kitchen, a living room with comfortable sitting options, a dining area with a large table, and multiple props to reproduce the appearance of an inhabited home.
The Smart Home devices are managed using Home Assistant and include: • 60 distinct lights (spots, panels, ambiance luminaries, all controlled via DALI) Other devices, such as door locks, atmospheric sensors, smart appliances, or cameras, were not used in the study and thus not listed.

EXPERIMENT DESIGN
We designed the experiment to reproduce a realistic use case scenario while attempting to consistently collect reliable data to reach our research goals.To achieve this, we devised a scripted procedure consisting of three tasks, one per condition, and used standardized questionnaires to collect quantitative data.Additionally, we collected qualitative data over individual semi-structured interviews with all participants of the experiment.

Procedure
The participation had a total duration of approximately 60 minutes for each participant.Participation and travel time to the remote location of the lab were compensated for a fxed total of 70€.This amount was suggested by the recruiting agency in consideration of the increased logistics and travel time required for participation during working hours.
4.1.1Preparation.Participants were welcomed, briefed, and prompted to provide written informed consent for their participation.Details regarding data privacy were collected, processed, and stored following European GDPR and approved by our data protection ofce.The participants then received a short introduction to the concepts of Smart Homes and AR.This was followed by an explanation of the problem of assigning real-world positions to the devices connected to a Smart Home system and how this can be achieved using AR.
Before starting the tasks, participants were asked to fll out a questionnaire collecting information about prior experience with Smart Homes and AR, and categories of Smart Home devices in possession and planned to be purchased.
Next, participants were asked to wear the HoloLens 2 and follow the calibration procedure.This was followed by two interactive tutorials.The frst one was based on the default MRTK Hand Interaction Sample Scene9 , including the use of the hand menu10 gesture.This tutorial acquaints the user with the general interaction concept and, in particular, with the elements relevant to this user study.The second tutorial teaches the participants how to turn lights on and of using the interactions learned during the previous tutorial.

Task.
The order in which each interaction design variant was administered was counterbalanced across participants to compensate for learning efects.For the manual variant (see Section 3.2.1),participants were asked to position the spheres manually for 50 lights and 4 speakers.This condition of the task was limited to 12 minutes for the sake of brevity, and to keep the participation within a reasonable time frame.We included all available lights in the lab for consistency to avoid participants completing the task before the time limit has passed.After the time passes, the task is interrupted independently of the achieved progress.
For the semi-automatic variant, participants were asked to use the interaction described in Section 3.2.3.The task consisted of assigning the same 50 lights and 4 speakers used in the manual condition.This task was also limited to 12 minutes.The automatic variant followed the methodology described in Section 3.2.2.Thus, the duration was limited to less than a minute.
After concluding the task for each application variant, participants were asked to fll out multiple questionnaires: Technologybased Experience of Need Satisfaction (TENS) [36], the short version of the User Experience Questionnaire (UEQ-S) [46], the Technology Acceptance Model (TAM) [34,56], and the NASA Task Load Index (NASA-TLX) [18].
The UEQ-S and NASA-TLX are well-established tools in HCI to measure subjective user experience and subjective workload, respectively.We used TAM to assess perceived values, perceived enjoyment, perceived usefulness, and intention to use [56].Following the literature and to keep the questionnaires short, we used only one item with the highest factor load for each of the target topics.
We used a subset of the TENS questionnaire, namely the TENS-Interface and the TENS-Life.The TENS-Interface questionnaire assesses autonomy and competence.In the TENS-Interface questionnaire, the third self-determination theory construct of relatedness is optional.Yet, we wanted to explore if a direct interaction model and the setup process would have efects on the relatedness not to other people, but rather the Smart Home environment itself.Therefore, the TENS-Life subscale was adapted and used to assess perceived relatedness.
After the completion of the task for the three variants, we collected data about each participant's gender, age interval, and type of home.Additionally, they flled out the Afnity for Technology Interaction scale questionnaire (ATI) [14].During a short semistructured interview, participants provided insights regarding general observations, preferences, and efciency ranking of the alternatives as well as overall user experience feedback.Interviews were recorded, transcribed with Whisper AI 11 , manually checked for errors, formatted, and coded.The interview guide is available in the supplementary materials.

Participants
We recruited 28 participants from a specialized agency.We targeted the general adult population within a radius of 50km of the lab.13 participants identifed as male, while the remaining 15 identifed as female.The age range was 19 to 64 years, with an average of 36.Regarding their living accommodations, 18 participants reported living in an apartment, 8 lived in a house, and 2 occupied a room in a shared fat.17 participants reported having at least one Smart Home device, and 14 of them have been using Smart Home technology for longer than 2 years.Participants that use Smart Home technologies have devices of an average of 5 smart devices categories (range is 2 to 12) out of an open list of 16 categories based on Home Assistant's physical entity types 12 .11 participants indicated they would buy more Smart Home devices in the future, 9 were undecided, and 8 would need to inform themselves before deciding to buy more.
Regarding experience with AR technologies, 16 participants claimed to have no prior experience with HMDs. 10 participants had used AR HMDs once or twice, and 2 participants had used AR HMDs more than two times.

RESULTS
We analyzed the collected data using non-parametric Friedman tests since the assumptions of normality and sphericity for ANOVA were not met for all tests.In the cases where signifcant diferences between conditions were found, we applied Conover's test with Bonferroni-correction for post-hoc analysis [10].The signifcance level was considered at the usual value of 0.05 for all tests.An overview of the results can be seen in Table 1.

User Experience and Task Load
The scores for the Task Load Index were signifcantly diferent between the manual condition and the semi-automatic condition (see Figure 4), except for the performance subscale.Cronbach's alpha for the task load was 0.901.On a scale from 0 (no load) to 100 (high load), the overall task load scores for the semi-automatic condition ( = 12.083, = 13.812;< 0.01) and automatic condition ( = 11.548,= 12.903; < 0.001) were signifcantly lower ( 2 (2) = 20.434,< .001)than in the manual variant ( = 29.315,= 21.998).
The UEQ-S is measured with a 7-point Likert scale, with values between -3 and 3. (see Figure 5).Cronbach's alpha is 0.813 for the hedonic and 0.716 for pragmatic subscales.The collected UEQ values are consistently high for all items across all conditions.The overall UEQ-S score is signifcantly higher ( 2 (2) = 20.058,< .001)for the semi-automatic ( = 2.562, = 0.351; < 0.001) and automatic variant ( = 2.429, = 0.712; < 0.01) compared to the manual alternative ( = 1.812, = 0.846).While both semiautomatic and automatic options have signifcantly higher scores on the pragmatic and hedonic UEQ-S subscales, only the pragmatic scores show a relevant diference.The hedonic user experience is rated very high for all three conditions ( > 2.5).Notably, all 28 participants rated the semi-automatic experience with the highest score for the decision between "usual" and "leading edge".

Interviews
A thematic analysis was conducted on the data collected in the interviews, using an inductive coding approach [9].In total, two and a half hours of audio-recorded interviews were transcribed (total duration: 02:27:26, average duration: 00:05:16, SD: 00:02:34).Two researchers coded 6 of the interviews independently (ca.20% of the total), sampling interviews randomly.Duplicates were expelled, and a fnal coding tree was jointly developed and refned through an in-depth discussion of results.Subsequently, one researcher coded the rest of the interviews.Based on the coding tree, the following six overarching categories were identifed, comprising a total of 18 themes.Figure 6 shows the distribution of the occurrences of each category and theme.In the following paragraphs, we summarize the categorization and provide exemplary quotes for each of the themes.

First Impressions.
A subset of the material is related to the initial impressions of participants when interacting with the prototype.Many participants ( = 18) emphatically expressed a strong enjoyment of the interaction of using AR HMDs to set up a Smart Home.Within this group were present both experienced and novice users of AR.We called this theme Wow Factor.Extending these thoughts, we defned a theme as Curiosity Evoking, for statements about how the interaction mode evoked curiosity and exploration, in order to get to know the system and the Smart Home environment ( = 10).At the same time, a repeatedly occurring theme was the need to learn how to properly use the HoloLens in the setup process ( = 18).We classifed this as Learning Efects.Importantly, participants stated that initial challenges with the interaction could be overcome quickly within the time of these frst interactions -or that they believed additional practice would surely enable them to use the system well.

"I had to frst get used to what the device wanted from me. And practice that. It's a matter of practice for me. " -P21
To that end, participants repeatedly remarked about some initial difculties with the interaction mode ("the pinching" motion for positioning the bulbs in the room was sometimes mentioned as error-prone; = 21) and errors in the manual positioning due to depth perception conficts ( = 4) where they thought they had placed a bulb at a further location that was later revealed to be  incorrect but not visible from the initial vantage point.However, no participant considered these challenges as a major issue, but rather an annoying nuisance emerging during the frst moves with the manual confguration of the Smart Home.

Technical
Capabilities.The technical capabilities of the prototype were another recurring topic.As described above, some users declared experiencing Depth Perception Issues ( = 4), indicating that the UX sufers detrimental efects caused by technical limitations.
"The problem was in the depth, but also somehow the position in the room in general.So the perspective didn't always quite ft." -P7 We did not measure the ofset between the actual placement in the manual task and a potential correct position.The correct placement is partly of subjective character as users have to choose where they want to interact with the device.However, all participants placed the spheres near the correct device without exception.
Many participants expressed having experienced Interaction Mode Issues ( = 21).In particular, "the pinching" motion for positioning the bulbs in the room was often mentioned as errorprone.General detection of gestures by the HoloLens seemed to be a recurring issue: "So sometimes it didn't work right away to bring up the menu, or bringing up the menu worked, but then tapping on it didn't." -P20 In contrast, the automatic detection features were generally considered to function well and smoothly.Many participants reported having Trust in the Capabilities of the system ( = 17).As we used a Wizard-of-Oz study method, we should point out that this trust in the system's capabilities is likely underlying other impressions about the usability and preference for interaction modes.
"In hindsight, I did think, okay, what if something goes wrong.But I felt, or I got the impression, that it then found things well.Yes, so I would trust the system."-P11 The Simplicity of the interaction (for the fully automatic variant) was also highlighted by participants as a positive feature ( = 7).
"I found it quite exciting to see how fast some things can happen, how everything is captured automatically." -P22 5.4.3Afect, Load & Control, Relationship & Understanding, and Diverse Ideas.Beyond these more general observations, the remaining emerging themes are best discussed in connection to the diferent conditions.Here, especially the manual and automatic characteristics of the setup were contrasted by the study participants.
The manual aspect of the home confguration was often appraised as playful and Fun ( = 12), often mentioned together with the curiosity about the system's functioning (see above), and an interest in feeling an achievement through the setup process (that is not given by automatic confguration) or a sense of Personalization ( = 11) connected to the setup of the Smart Home.
"I also liked the manual version because it has this certain playful aspect to it, and honestly, you don't set up new devices that often." -P22 "Well, I believe the version where I can set it up myself is just more individualized." -P4 Similarly, the advantage of staying in Control and keeping an Overview of the process was mentioned ( = 23).
"I did a bit, walked around the apartment a bit.I felt responsible for the setup, but didn't have to do everything myself." -P18 However, another major theme for the manual setup was its Strenuous and demanding nature ( = 23).
"It was just frustrating with the whole setup of the individual devices." -P17 This was mentioned as the major downside of the manual setup experience, together with its low level of Efciency ( = 27).For example, several participants raised doubts about the utility of a manual setup if it were employed for many Smart Home devices or repeated setups.In contrast, the automatized setup features were mostly appraised as delivering high Efciency.
"Of course, the most efcient is the automatic version.I walk through the room, and the thing is done.I don't really have to choose anything; I don't have to make any decisions." -P7 As participants reported high trust in the system's technical capabilities, this setup mode appeared to many as the quickest and easiest way to process the task.However, in the fully automatic  condition especially, participants described the experience as Overwhelming ( = 16), losing their overview of the confguration or experiencing an Alienating ( = 5) sensation as the system takes over the task completely."I think, for example, I would not recommend this to my mother; she would probably freak out if something like this happened in her apartment." -P18 "It was also a bit strange, especially when all the things started to light up or draw attention to themselves."-P28 Between these two extremes, the majority of the participants appraised the semi-automatic condition as the best of both worlds.This is refected in the identifed preferences for either condition (see Section 5.5).However, we believe that this preference is not merely emerging as a consensus between the two approaches but rather as a productive integration of nuanced aspects of them.We observed that participants often mentioned a preference for combining both manual and automatic processes, and also benefting from reduced levels of both aspects, resulting in a more Comfortable interaction ( = 15).
"The second one [the semi-automatic condition] was the most relaxed, I could pick a few devices that I want and the rest is done automatically." -P5 For example, the group-wise setup process was often appraised as providing a necessary overview that brings users "on board" with the partially automatic confguration, through which a sense of cooperation and Partnership emerged ( = 10)."There, I just have the feeling of having accomplished something and having contributed, and the device doesn't do everything on its own." -P7 In this spirit, we also want to highlight that participants discussed related Diverse Ideas ( = 15) for integrating the features from the three conditions further and did not just declare a preference for one over the other.For example, participants remarked that further gamifcation of the manual approach would be interesting or that the Adjustment Options ( = 9) of choosing the setup approach based on mood, time pressure, or user in the household would be benefcial over employing just one of the modes.Furthermore, it was a recurring theme that extending the system to allow the opposite order (automatic frst, manual adjustment second) would be a vital feature.
"So, if there were, let's say, a game module included, where I could participate in some AR gaming situation with the glasses, okay, that would surely be great." -P21 "Yes, then it would be good if you could adjust it a bit." -P13

Actions & Preferences
The behavior of the participants during the experiment was recorded.Within the 12 minutes of the manual condition, we observed that participants placed 20.93 entities on average ( = 5.74).In the semi-automatic condition, participants were given the option to choose to parallelize the detection of the rest of the device category (i.e.lights and speakers) after each device detection.15 participants chose to automate all remaining lights after one detection, 6 participants tried up to 5 individual detections, and 7 participants performed between 6 and up to 20 individual detections.
Overall, 21 participants (75%) stated a preference for the semiautomatic alternative over the two other variants, followed by 6 participants (21.43%) favoring the automatic option, and 1 participant (3.57%) preferring the manual option.For most participants, the automatic alternative made second place (60.71%) and the manual alternative last place (82.14%).27 participants (96.43%) rated the automatic version as the most efcient option.One person rated the semi-automatic version as the most efcient one with the comment that they would individually check and correct each position after using the automatic variant and, thus, require more time than with the semi-automatic alternative.

DISCUSSION
Our mixed-method results provided rich insights into the anticipated trade-ofs between designing a Smart Home AR setup and classical UX dimensions (RQ1-3).Importantly, beyond our research questions, we identifed valuable fndings through the design exploration.To provide structure to the discussion of our fndings, we group the themes as follows:

Psychological Needs & UX Trade-Ofs
The data collected during the interviews combined with the answers to the TENS questionnaire and the intention to use suggests that perceived competence and autonomy may have a role in the preference rating between the three interaction variants.Here, it is not possible to exclude technical limitations being an additional factor in this equation.The HoloLens 2 ofers a limited Field of View (FoV): 43° horizontal FoV and 29° vertical, roughly a third of human typical vision [20].This constraint becomes particularly challenging for hand interaction, since gestures must be consistently performed within the HoloLens cameras' FoV.Especially in this room-scale application, this can lead to signifcantly higher levels of frustration, lower pragmatic user experience ratings, and also have an impact on perceived competence.
Both the semi-automatic and automatic variants were rated with overall low load and high user experience scores, confrming our expectations for RQ2 and RQ3, that these more automated variants would lead to better classical UX experiences (whereas the manual condition showed poorer UX perceptions as outlined in RQ1).Further, both the semi-automatic and automatic variants scored high levels of perceived competence.This is aligned with the preferences stated explicitly by the study participants, who largely prefer these two variants over the manual option.Overall, we were a bit surprised about the lower levels of competence in the manual design variant as we expected higher psychological need satisfaction in the manual condition overall (RQ1).It appears, that our participants did not experience the manual setup as competence building, possibly because of some initial challenges with learning the controls, and also because the setup progress was fairly slow.While anticipated diferently, this does potentially highlight the trade-of that high control can undermine competence needs if it slows the user in achieving their tasks.
Furthermore, the manual and semi-automatic variants showed similarly high levels of perceived autonomy, showing that an effective balance between automation and manual control can be achieved that still acknowledges autonomy.This observation further supports our expectation that a more manual variant would increase psychological need satisfaction (RQ1), at least for the autonomy dimension.Also, comparing the automatic against the semi-automatic variant, the perceived autonomy metric suggests that users value being involved in the interaction.This is attested by the preference for the semi-automatic variant over the more efcient automatic alternative, where users are passive observers.Thus, it is possible to argue that in this particular case, the fulfllment of psychological needs has precedence over pure functional efectiveness or efciency.Most importantly, this result epitomizes the expected trade-ofs for a fully automated setup variant (RQ2) and afrms our consideration in RQ3 that a combination of manual and automatic features could strike a more efective balance of psychological need fulfllment and classical UX design considerations.
The exploratory use of the relatedness subscale with the alternated subject of the Smart Home instead of other people did not show signifcant diferences between the conditions due to a large variance in the ratings.Interestingly, the interviews provided context to this variance, since diferent lines of thought between participants can be reconstructed.On the one hand, some participants reported a strong feeling of connectedness to the Smart Home environment through the immediate and direct interaction with it (evidence that would support the expectation of RQ1 that a manual interaction could create stronger need fulfllment).This is even more remarkable considering the setting of the experiment being a lab inside a remote corporate complex.On the other hand, some participants felt disconnected from reality by using the AR HMD: "The screen creates a distance.At the same time, you are in the middle of it, but like in another world.So, to me it is a diferent reality." -P24 Of course, it remains to be explored if this efect is temporary and may fade away once the user gets used to AR.This prompts a further, more intriguing question about the nature of the relationship between users and Smart Homes mediated by AR.Combined with an increasing level of agency in Smart Homes and artifcial intelligence applications, a high level of connectedness can result in dramatic changes in how people conceptualize homes.
On a more general note, the discussed results highlight the importance of psychological needs when considering factors for AR and Smart Home interactions.Application designers must be sensitive to the potential diverse emotional and social efects of AR [4,50], especially in home environments.

Wow-Efect: Novelty and Ceiling Efect
In the accompanying handbook for the UEQ Scale 13 , the authors warn that it is unlikely to observe any average score above 2 due to diferent opinions and people's tendency to avoid extremes.Yet, the semi-automatic ( = 2.562) and the automatic version ( = 2.429) are well beyond this threshold.Additionally, participants characterized the experience as fun, futuristic, exciting, or fascinating, and 18 participants explicitly described the interaction as a great experience overall.We relate this to both novelty and ceiling efects, as no participant reported having experienced a similar AR application before.Although AR applications have been used and studied for decades, the particular application of confguring a Smart Home seemed to be particularly attractive to the study's participants.This perhaps underlines the potential for AR to establish a close connection between the user and their surrounding.
However, we cannot eliminate the possibility of positive bias caused by the experience of participating in the study at a modern research facility, or by the relatively high compensation.

All Alternatives Have Their Benefts
Diferent characteristics of the design variants make them interesting for users, even if the overall variant is not their frst choice.This is supported by participants' statements during the interviews.The automatic variant is attractive due to its efciency, with many participants being torn between this option or meeting their psychological need for autonomy and competence through the semi-automatic alternative.The manual variant's potential for gamifcation was mentioned by 12 participants during the interviews.While some stated that the gamifcation character is not important to them, all participants who mentioned this characteristic stated that it is either important to them or to another family member.Further, it was suggested that in the case of the setup process taking longer, the fun character should be emphasized for an overall better experience.
The choice of the optimal solution will depend on the circumstances of the interaction while performing a given task in a given situation.These circumstances may pose diferent time constraints, 13 https://www.ueq-online.org/Material/Handbook.pdf (Accessed: 02/19/2024) diferent expectations towards duration and playfulness, and different expectations towards the accuracy of positioning or tidiness, thus shifting the weight from one factor to another.This is supported by the statements recorded during the interviews.For example, 9 participants stated their interest in personalizing device positions after the automatic placement, and one participant even took the time to meticulously check the position of each entity after the automatic confguration.
Regardless, based on the gained insights, we can formulate some recommendations for future iterations of this application.The most important is to keep the user in the loop.It is paramount to give the user options about the degree and type of automation, and include options to adjust positions after placement.When providing fully automated placement, the process needs to be made visually transparent, make the user feel in control, and eventually ofer the user to control or monitor the frst few devices to understand the process.

Use Cases of the Setup Process
The proposed system presents clear benefts for the initial setup of multiple static devices, since the automated solutions can save signifcant amounts of efort and frustration.In the future, the system could allow to easily update the position of movable devices and notify the user if a device changed its location (e.g., based on wireless signal intensity).Additionally, during an initial setup, the AR HMD could visually record the position of devices and automatically detect them at the new position via image detection mechanisms.Dynamic devices capable of self-tracking, such as vacuum-cleaning robots, can be synchronized with the HMD aligning their coordinate systems and then providing live position updates.

Future Use Cases of AR in the Smart Home
The information about the location of connected devices within a Smart Home can enable further applications well beyond the scope of our proposed design.We envision AR applications controlling not only individual entities but complete groups of entities in direct interactions.Further, interaction can simplify lengthy or complex tasks through automatic grouping of entities using diferent criteria (e.g.type of device, location in a given area, user preference, etc.).This can be further extended using artifcial intelligence to create dynamic flters or the automatic creation of routines.This allows to, for example, toggle lights when entering a room or run a specifc service when in the proximity of a device [45].Finally, this can enrich the user experience in households with multiple members, empowering individual users to create both personalized and collective experiences.

LIMITATIONS AND FUTURE WORK
As stated before, the HoloLens 2 hand-tracking FoV and quality present a clear constraint for the proposed interaction.This problem can be addressed using downwards-facing cameras, as in the Apple Vision Pro.This device will likely improve the issues faced by the participants of our study.Furthermore, we did not measure the actual performance of the light detection implementation.As the HoloLens 2 does not have state-of-the-art sensors and cameras, performance metrics would not be representative of this approach.Still, detecting IoT device positions based on tags or even a precise ultra-wide-band indoor positioning solution is likely faster than the approach presented in this paper.However, when including the time required for setting up and calibrating such a system, we argue that our approach is faster, less error-prone, and more user friendly.
Another important limitation is that our approach only works for devices capable of attracting attention.Lights, blinds, audio devices, fans, or anything with a display can be instrumentalized to emit an identifable signal.Many large home appliances, such as ovens, washing machines, or hood vents can become detectable.However, some devices can only remain silent and still, making their identifcation by our system more difcult.We see this challenge as hard to overcome but also of relative criticality: our system captures a large range of Smart Home devices, and especially those that come in large quantities (e.g., lights).
A further limitation to consider is the context of the study.Despite the high score of connectedness that some participants reported, the study was conducted in a lab setting.This aims to replicate a modern fat with many Smart Home devices, but it remains a foreign place for the study participants.A feld study in actual home environments could ofer a higher validity and deeper insights that could become visible only in such an environment.
Here, it is important to highlight the exploratory nature of the study.Future studies should look into long-term usage, as well as the incorporation and assessment of further functionality (e.g., adjusting placement of automatically positioned devices, automatic grouping, and incorporation of artifcial intelligence elements).
Finally, this study was conducted using a Wizard-of-Oz technique to present the participants with a credible interaction.Although our prototype is capable of detecting lights on a per-device basis (similar to the semi-automated option), we plan to implement and test a fully parallelized automated version in the future.

CONCLUSION
In this paper we investigated two main topics: frstly, we proposed a solution for the spatial confguration of Smart Homes using AR, developed a prototype with a basic functionality, and evaluated the concept through a controlled experiment.Secondly, we investigated the efect of psychological needs, specifcally autonomy, competence, and relatedness, as a factor of user preference for interaction design.
In the conducted user study, participants performed the task of setting up Smart Home devices spatially using an AR HMD.The task was performed under three diferent conditions: manual positioning, semi-automatic positioning, and automatic positioning, which we compared towards their support of psychological needs and classical UX dimensions.
The collected data indicates a general preference for the semiautomatic positioning method, despite the automatic alternative being faster and more efcient.The participants' statements recorded during post-participation interviews suggest that this preference stems from their psychological needs being best addressed by the semi-automatic variant.This is aligned with the reported TENS scores for autonomy, competence, and relatedness.
Additionally, the interaction design proposed for the confguration of Smart Homes was received positively by the participants.Supported by the collected data, this suggests that our technique for locating Smart Home devices is a viable alternative to typically manual approaches.
Based on the feedback collected through interviews and further insights obtained through the analysis of the quantitative data, we derived some recommendations for future applications in similar contexts.

Figure 2 :
Figure 2: Schematics of the three interaction design variants: manual (left) -showing that a user places a sphere on a lamp for a manual spatial confguration of the respective device, semi-automatic (center) -the AR cameras detect a fashing smart light automatically if the user briefy focuses on the device to set the spatial position one device at a time, and automatic (right)showing that all devices emit signals for a simultaneous spatial setup of each device.

"
I wanted to try it out.I just looked to see what would happen.And then, after, I don't know, what did I click, I had seven or eight lights, so I clicked on it quite late and thought I'd give it a try." -P19

Figure 6 :
Figure 6: Occurrences of themes in the interviews.Bars and corresponding numbers on top refer to the number of participants mentioning these themes.Themes are clustered in categories by color.The codebook can be found in the Supplementary Materials.

Table 1 :
Result analysis: for each scale and condition, the calculated average and standard deviation, along the results of the Friedman and Bonferroni-corrected post-hoc tests.
"So, in general frst.It was defnitely a very interesting experience, to be honest.And it's truly impressive what's possible and how it might actually look in the future."-P282(2)= 20.434,< .05Man-Semi, Man-Auto