The relationship of visual and aural perspective with decentering in virtual reality

Several areas of application for virtual reality technologies have seen significant growth recently, including mindfulness practice. This paper investigates the relationship between the mindfulness-related construct of decentering, i.e., detaching oneself from automatic thinking and observing one's thoughts and emotions, and body representation within virtual reality. Using a within-subjects approach, we investigated whether a third-person perspective, both in terms of visual and aural representation, affects decentering in virtual reality. This was done using a virtual reality application that presented a guided body-scan meditation in virtual reality and utilised these different forms of perspective. Our results do not show a significant effect of either form of perspective on decentering in virtual reality.


INTRODUCTION
Mindfulness practices have seen significant growth and positive results as an inexpensive and relatively safe treatment for various healthcare-related topics in the last 10 years [16].The term "mindfulness" itself broadly refers to a present-moment awareness of one's own thoughts, emotions, and bodily sensations [2] whereas mindfulness practices refer to ways of promoting this awareness, such as meditation [11].An example of this is a body-scan meditation, which refers to systematically paying attention to different areas of one's own body, often guided by audio narration/instructions [5].Mindfulness-based interventions have been effective for a variety of conditions, including depression, anxiety, stress, and addiction [16].A commonality between various styles of mindfulness practices is a goal of "distancing" oneself from one's thoughts, emotions, etc., toward observing the inner processes of one's own mind and body objectively [11].This process of adopting a "self-distanced perspective" is a crucial component toward the building of internal regulation of one's own thought processes, emotions, etc., in contrast to an automatic reactivity to these processes [2,4].This ability is often referred to as "decentering", which describes this metacognitive ability to detach or disidentify oneself from automatic thinking and reactivity to thought through establishing meta-awareness of thought [2,9].
The aim of this paper is to investigate the role of a technologyaided mindfulness application in helping users adopt decentering by representing this self-distancing using virtual reality (VR).This was done by representing a self-distanced perspective to participants through a third-person perspective of their own virtual body.Our approach was guided by the notion that decentering relies on establishing a sense of distance between oneself and one's automatic psycho-physiological processes in a way that affords an objective "outsider" view.The study was motivated by a lack of exploration with regards to self-representation, specifically toward using VR to create an altered or illusory sense of self-representation that may affect an individual's sense of self-perception and decentering [1,7].There is also a general lack of investigation into different sensory perspectives in VR, such as using a visual third-person perspective (3pp) as opposed to the "default" first-person perspective (1pp) [6].We thus extend this investigation to include a separate audio perspective, i.e., listening from a different point in virtual space, since there is limited research on the effect of audio properties in VR mindfulness [3].The study was thus guided by the following research question: what is the effect of a visual and aural third-person perspective on the experience of decentering in virtual reality?
In the context of this study, VR was used to specifically refer to using a head-mounted display (HMD) such as a Meta Quest or HTC Vive.Research into the use of VR in mindfulness has seen an uptake since 2011 [1,7], driven by possible advantages of VR in this context, such as the ability to simulate any space/environment with high fidelity and to alter one's bodily perception through representation of a virtual body and a sense of presence within this body [8,13].Previous research in the use of VR for mindfulness has explored various factors present in guided meditation applications, such as approaches for narration, environments, and interactivity [1,7].Various outcomes have been investigated, including interoceptive awareness using the Multidimensional Awareness of Interoceptive Awareness (MAIA) scale and an overall mindfulness score using the Five Facets Mindfulness Questionnaire (FFMQ) [7].Decentering, however, is an underexplored concept in VR mindfulness research, which is not only crucial for improving the efficacy of VR mindfulness applications, but also for reducing possible negative effects stemming from maladaptive behaviours gleaned from mindfulness practices [2,4].To address this gap, our study investigated the effect of virtual self-representation on decentering in the context of a VR mindfulness application.We thus formulate our hypotheses as follows: • H1a: The use of a third-person visual perspective in a VR mindfulness application will lead to an increased sense of decentering compared to a first-person visual perspective.• H1b: The use of a third-person aural perspective combined with a third-person visual perspective in a VR mindfulness application will lead to an increased sense of decentering compared to a first-person aural perspective combined with a third-person visual perspective.

METHODS
This study followed a quasi-experimental (crossover) approach using a within-subjects design and three experimental conditions: visual 1pp and aural 1pp (1v1a), visual 3pp and aural 1pp (3v1a), visual 3pp and aural 3pp (3v3a), explained in more detail below.A within-subjects design was employed to detect differences between the aural perspectives (3v1a and 3v3a) which were expected to have a relatively small effect since this is not a common design approach in existing media.We also expected the carry-over effects of decentering to be small enough to be managed by counterbalancing.Analysis was done using Jamovi [14].Following the principles of open and reproducible science, the full study was pre-registered at: https://osf.io/v6az5.

Participants
The sample consisted of 52 participants (33 M, 18 F, 1 U) aged 19 -45 (M = 27.6,SD = 5.14) with no known issues using VR.Participants were recruited using mailing lists for Tampere University and participation in scientific research.Sample size was determined through a combination of power estimation and study time constraints.Fifty participants would have approximately 80% power in a within-subjects design for detecting a main effect of d = 5 when comparing condition A to conditions B and C. While the initial sample was planned at 50 participants, more were recruited to anticipate drop-outs or otherwise unusable data, hence the final sample size of N = 52.As the national research guidelines of Finland indicate no requirement for an ethical approval for this study, the voluntary participation information sheet and the privacy notice were approved by the University.Participant compensation consisted of a movie voucher.

Materials
To test our hypotheses, we developed a VR application to administer a mindfulness exercise.The application placed users in a forest environment and played a guided body-scan exercise.The avatar was created as a simplified body representation using simple geometric shapes to avoid specific gender/racial associations.For the body scan exercise, two previously-developed 10-minute audio recordings from different mindfulness-related studies [5,15] were utilised with the authors' permission.Since participants were exposed to three experimental conditions in a row, it was anticipated that using the same 10-minute audio clip for each condition would result in listening fatigue; hence, two different clips were utilised.
For each condition, mindfulness narration and ambient music were localised to two abstract objects rotating in opposite directions around the avatar's head in 3D space.In the visual 1pp condition, participants embodied and perceived the virtual environment as the virtual avatar, whereas in the visual 3pp condition, they saw the virtual avatar from behind and to the right, but the avatar still copied the participants' body movements.In the aural 1pp condition, participants heard the sounds from the floating objects as if they were orbiting their own head, whereas in the aural 3pp condition, sound was heard from the position of the 3pp avatar's head.The application was created using Unity3D with the Microsoft Spatializer plugin for spatial audio [10] and was deployed on an HTC Vive using the built-in headphones of the device.
For measuring decentering, Detachment from Automatic Thinking from the embodied mindfulness questionnaire (EMQ) [9] was used, which captures "the ability to detach from automatic (unintentional) thinking or its opposite (i.e., being attached to, caught up, absorbed by, or believing thoughts)" [9:5] through items such as "I got absorbed by my thoughts".The five items were rated on a 5point Likert scale, with the anchors 1) Almost Never and 5) Almost Always.Calculating the dimension mean involved reverse-scoring so that a higher mean reflects a higher degree of decentering.The measured dimension demonstrated a high degree of internal consistency for each condition: Mcdonald's  (A:  = 0.90, B:  = 0.84, C:  = 0.86).

Procedure
Before the test, participants filled out a survey which included demographic information, using LimeSurvey (https://www.limesurvey.org).On-site, participants were exposed to the three conditions in a randomised order and completed the EMQ after each exposure.The visual and aural perspectives served as the independent variables and the experience of decentering served as the dependent variable.Both the order of the experimental conditions, i.e., 1v1a, 3v1a, and 3v3a, and the order of the two audio clips were randomised separately for each participant, thus six possible orders were randomly assigned to participants.For the audio clips, the order determined which clip  was used first, thus the two options were 1, 2, 1 and 2, 1, 2. At the start of the test, participants were helped with the VR headset and given in-VR instructions to interact with two buttons in the application to ensure participants observed the behaviour of the different virtual body representations.

RESULTS
A one-way repeated measures ANOVA was conducted to determine whether there was a significant difference in detachment from automatic thinking across the three conditions (H1a and H2a).There were no outliers and the data for each condition was normally distributed, as assessed using boxplots and Shapiro-Wilk test (p > .05),respectively.The assumption of sphericity was not met, as assessed by Mauchly's W (W = 0.866, p = 0.027) and, as such, Huynh-Feldt corrections were applied.No significant differences between the conditions were found, F(1.82,89.92)= 0.351, p = 0.685.Therefore, neither hypotheses are supported.

LIMITATIONS
Several limitations of our study might have influenced the results.Firstly, the design of the VR application used a simplified avatar, which could have hindered participants' ability to embody the avatar.Secondly, the concept of an aural perspective is, in itself, a relatively unexplored concept which participants might not have noticed due to this unfamiliarity.Third, our sample primarily consisted of students from a single university, which could have skewed the age distribution toward a younger population and introduced other demographic-specific outcomes that affect mindfulness [12].Our sample could also inadvertently have had an effect on technologyrelated factors, such as familiarity with VR technology.Lastly, we also limit our scope to decentering as a proxy for beneficial outcomes that have been associated with mindfulness.

CONCLUSION
To the best of our knowledge, this is the first study to investigate the effect of perspective on mindfulness-related outcomes in VR, contributing to the growing field of VR and mindfulness by exploring the notion of virtual body representation and perspective, including visual and aural feedback.Our sample indicated no significant difference between visual and aural perspective in terms of decentering during a VR mindfulness exercise, thus suggesting that a distanced self-representation in VR alone might not be enough to create a psychological sense of distance, i.e., an increased sense of meta-awareness.As this study indicated no specific benefit or detriment from any condition, it encourages experimentation and diversification when designing mindfulness VR exercises in favour of engagement, personalisation, and user retention.Understanding the reason for our outcome will require further investigation into participants' sense of virtual embodiment as it relates to their sense of embodied mindfulness, as well as other dimensions of embodied mindfulness, which will be expanded on in future research.Future work might also investigate more explicit approaches for detecting physiological states, e.g., electroencephalograms (EEGs), and representing them to the user, as this might give a clearer representation of an individual's internal state and might have a greater effect on decentering.While we also limit our scope to decentering as a proxy for mindfulness related outcomes, further research might explicitly focus on specific benefits and the extent to which constructs such as decentering are associated with a lasting effect.Finally, future studies might diversify the sample to include participants from more diverse backgrounds as well as investigate the effects of state measures and demographic factors on the measured outcomes.
the Academy of Finland Flagship Programme [Grant No. 337653 -Forest-Human-Machine Interplay (UNITE).

Figure 1 :
Figure 1: The visual first-person version (left) and third-person version (right) of the VR application

Table 1 :
Descriptive statistics of Detachment from Automatic Thinking for the three conditions