Watch This! Observational Learning in VR Promotes Better Far Transfer than Active Learning for a Fine Psychomotor Task

Virtual Reality (VR) holds great potential for psychomotor training, with existing applications using almost exclusively a ‘learning-by-doing’ active learning approach, despite the possible benefits of incorporating observational learning. We compared active learning (n=26) with different variations of observational learning in VR for a manual assembly task. For observational learning, we considered three levels of visual similarity between the demonstrator avatar and the user, dissimilar (n=25), minimally similar (n=26), or a self-avatar (n=25), as similarity has been shown to improve learning. Our results suggest observational learning can be effective in VR when combined with ‘hands-on’ practice and can lead to better far skill transfer to real-world contexts that differ from the training context. Furthermore, we found self-similarity in observational learning can be counterproductive when focusing on a manual task, and skills decay quickly without further training. We discuss these findings and derive design recommendations for future VR training.

For observational learning, we further compared three levels of similarity of the demonstrator avatar to the user (c): self avatars (d), minimally similar (e), and dissimilar avatars (f).Both active and observational learning approaches were efective, however, observational learning signifcantly improved the ability of users to transfer their learning to real-world tasks beyond the context learned in VR (g) compared with active learning.We did not fnd any signifcant efects of avatar similarity on learning.

ABSTRACT
Virtual Reality (VR) holds great potential for psychomotor training, with existing applications using almost exclusively a 'learning-bydoing' active learning approach, despite the possible benefts of incorporating observational learning.We compared active learning (n=26) with diferent variations of observational learning in VR for a manual assembly task.For observational learning, we considered three levels of visual similarity between the demonstrator avatar and the user, dissimilar (n=25), minimally similar (n=26), or a selfavatar (n=25), as similarity has been shown to improve learning.
Our results suggest observational learning can be efective in VR when combined with 'hands-on' practice and can lead to better far skill transfer to real-world contexts that difer from the training context.Furthermore, we found self-similarity in observational learning can be counterproductive when focusing on a manual task, and skills decay quickly without further training.We discuss these fndings and derive design recommendations for future VR training.

INTRODUCTION
Virtual reality (VR) is increasingly being used to teach people new skills in the workplace by immersing them in realistic training environments [129].Advances in commercial head-mounted displays (HMDs) make VR a convenient and cost-efective alternative where training would otherwise be high-risk, dangerous or expensive [37].Training in VR can be more enjoyable than its real-world counterpart and has been shown to be as [62], and in some cases more efective for learning [51].As a result, VR training is being deployed across a wide range of industries including healthcare, nuclear, transportation, and aerospace [37], and analysts have predicted VR training has the potential to boost global GDP by $294.2 billion by 2030 [98].The overwhelming majority (≈78%) of industrial VR training use cases involve procedural and psychomotor skills [101], i.e., skilled movements that require coordinated motor action and cognition [107].The most common skills are manual tasks that involve learning a procedure or sequence of actions that require the user to grasp and manipulate objects, such as construction [2,4,11,88], dental [84] and surgical [12,46] procedures, equipment operation [31,83,99,117], and tool use [92].
One of the advantages of VR training is that it ofers a platform for 'hands on' learning.As a result, experiential and constructivist learning theories are the most commonly implemented frameworks when developing VR training [101] and the predominant way to teach skills in VR is to utilise active learning approaches, i.e., 'learning-by-doing'.These approaches guide an individual to complete each step of the procedure while learning the skill through rehearsal [2,12,16,31,84,95].Physical practice is essential to psychomotor learning which occurs in three stages: cognitive, which involves remembering and understanding the required skill; associative, where the skills are refned through rehearsal; and fnally autonomous, where the skill is automatically replicated with maximum efciency and minimal conscious efort [29].Research has shown that virtual practice can prepare individuals in the same way as real-world practice [82] and that active learning in VR can be highly efective compared to traditional real-world training, especially when learning a manual assembly task [2,3,35,51,62].
Despite the emphasis on 'learning by doing', there is evidence to suggest that in the early cognitive stages of learning [29], observing a demonstration is benefcial [6,26].This is thought to be true especially when the demonstration involves a human model because it leverages an innate bias to 'learn by watching' others which is less cognitively demanding [26,73,90].Observational learning ('learning by watching') is an efective and commonly used approach for acquiring psychomotor skills outside of VR [18,40,45,59,64,102,118] which has been shown to provide equivalent outcomes to active learning whilst ofering potential benefts of time efciency [34,70,131] and reduced cognitive load [119,126].In particular, combining observational learning and physical practice is thought to be one of the most efcient and efective approaches to real-world training [6,59,102].VR training accompanied by observation of a real-world demonstration can be more efective for learning manual tasks than VR training alone [116], however, it is not always practical to watch a real-world demonstration and it is unclear whether observation can be efectively integrated in VR.
Observational learning theories predict that model similarity impacts learning efectiveness [10,23,24].One theory is that the action-observation network and overlapping mirror neuron system, which enables us to learn by imitating others, fres more as the similarity between ourselves and the model we are watching increases [7,20,23,67].The highest degree of model similarity is achieved using a self-model, and prior work has demonstrated that video self-modelling can be used to enhance manual fne psychomotor skills (e.g. in a cup-stacking task [43] or playing a video game [60]).Existing research into whether this phenomenon occurs using avatars has shown that learning is more efective when the avatar demonstrator is made to look increasingly similar to the user, either by matching just the skin tone, hair colour, and gender [27] or by using a photorealistic avatar [33].However, this has only been shown in the context of gross psychomotor skills, and understanding whether self-model avatars have the same advantages as video self-models for fne psychomotor learning is an important design consideration for a wide range of VR training applications that could utilise the observational learning paradigm.
While both active and observational learning approaches have merits, the current VR literature is dominated by active learning approaches and it is not well known how 'learning by watching' an avatar might compare.While comparisons between active VR training and alternative real-world training approaches (e.g.instruction manuals, videos, or AR) are common [3,34,35,51,82,89,100], there are few comparisons of diferent learning approaches within VR training [57].Additionally, it is difcult to retrospectively compare diferent learning approaches from the VR training literature because very few applications expose the learning theory or approach that has guided their implementation [101].
To efectively compare diferent learning approaches in VR, it is important to assess if skill transfer to the real world has taken place [14,69,82,92].Skill transfer in VR is commonly assessed using 'near' transfer tasks where the real-world task is identical to the one experienced in VR [2,82,125]; however 'far' transfer tasks, where the taught skills are applied in a dissimilar real-world context [39,78], are under explored.
We aim to address these research gaps relating to the relative efcacy of observational learning in VR and the role of avatar similarity in learning manual psychomotor skills by answering the following research questions: RQ1 How does active learning compare to observational learning of a fne psychomotor task in VR? RQ2 How does active learning compare to observational learning when transferring skills to the real world?RQ3 How does demonstrator similarity afect observational learning?
We begin by presenting an interview study (n=22) with a range of industry stakeholders who have experience designing, developing, delivering, or using VR training.This provides supporting evidence about the prevalence of active learning and importance of skill transfer in industry VR training applications, which complements prior reviews of academic literature [37,101,129], and leads to design considerations for how to approach VR training in our study.To address our research questions, we conducted a betweensubject user study (n=102) over three sessions, which compared the efectiveness of active and observational learning of an assembly task in VR.To evaluate and compare active and observational learning and their applicability to VR training in industry, we used a retention task in VR (RQ1), as well as near and far transfer tasks in the real world (RQ2).These tasks were conducted immediately after training, and after a 10-14 day delay to understand if and how learning decayed over time.To explore RQ3, we compared diferent demonstrator representations for the observational learning condition: fully customised realistic avatars of the user, minimally customised avatars, and dissimilar avatars.
This paper provides the frst evidence to show that observational learning can be efective for learning a fne psychomotor task in VR when combined with 'hands-on' practice, and can lead to better far transfer to more difcult tasks where distractions are present than active learning.This has important implications for how VR training should be delivered because there are very few use cases where a real-world task does not involve any variation, and it highlights the methodological importance of including far transfer tasks when assessing VR training approaches.We validated the poor retention of learned skills after a prolonged period shown in previous research [82], confrming the importance of when VR training is delivered.In contrast to prior work on learning of gross psychomotor skills, our analysis reveals that observing a self-avatar does not improve learning and can be distracting due to the novelty efect of the user seeing themselves virtually.This is important to consider because most learners who participate in VR training are unlikely to have experienced VR or their own self-avatars.In summary, we contribute empirical evidence showing the following: (1) The prevalence in industry of active learning in VR.
(2) The efectiveness of observational learning in VR when coupled with practice.(3) The superiority of observational learning over active learning for far transfer efects.(4) Increasing avatar similarity does not improve learning for fne psychomotor skills.(5) Learned fne psychomotor skills decay quickly without further training.

RELATED WORK 2.1 Fine Psychomotor Skills Training in VR
Training fne psychomotor skills, such as those required in assembly operations, represents a large proportion of costs in industry (e.g., manufacturing [96]).VR training reduces the amount of time spent training in sub-optimal conditions (e.g., on an assembly line) where mistakes can be dangerous and expensive [14,111,125].VR is regarded as a promising training tool, even for complex and demanding tasks [34], ofering cost and time savings whilst providing contextualised 'hands on' training without requiring access to or interfering with the real environment or equipment [34,125].Virtual training has been shown to be efective for skill acquisition, for example when learning a 25-step assembly of an electronic actuator [34] or a 19-step procedure to change the O-ring of a dosing pump [125].Subsequent immediate testing has shown that this training transfers to the equivalent real-world task.
The standard approach for acquiring fne psychomotor skills virtually is 'learning by doing' or active learning in which an individual is provided with a frst-person perspective and interacts with the virtual environment using their hands or controllers to learn the skill through practise and rehearsal (e.g., surgery [56], pump (dis)assembly [125]).This is usually achieved using multi-modal guidance which may include text and/or audio instructions, object highlighting to identify relevant components, animated demonstrations, and haptic feedback [2,4,11,88,95,99,125].This style of virtual training has been compared to other forms of training such as paper-based instructions and instructional videos and has been shown to produce comparable learning outcomes in many cases.For example, VR training equipped with animations to teach users to complete puzzle assembly tasks produced the same learning outcomes as when users were provided with paper instructions, an instructional video, and could practice with the physical blocks [82].However, active VR training applications are rarely compared to other styles of VR training which we address in this paper by comparing active and observational approaches.

Learning by Observation
In observational learning, the learner does not engage physically in performing the task but instead watches the actions of another person, mentally recording the information to reproduce it later.Within real-world training this is commonly employed using a master-apprentice model in which an experienced worker demonstrates the procedure whilst the apprentice observes, listens, and then follows their instruction [115].This method of learning is thought to be highly efective because there is a proposed functional equivalence between action observation and action execution linked to the mirror neuron system, which is active both when observing and performing a skill [53,80,94,119].The efectiveness of observational learning can be seen by the success and popularity of instructional videos [118].Prior research in VR has shown how it is applicable to learning gross psychomotor skills [5,27,47,87,128].However, to date observational learning has not been explored fully as a VR instructional method for fne psychomotor skills, which represent the most popular VR training applications in industry.
'Learning by watching' ofers possible benefts, allowing learners to build an accurate cognitive representation of the skill frst which they can use to guide their actions [8,15].This reduces the likelihood that the learner will develop false patterns of behaviour which are difcult to correct later, compared to 'learning by doing' where an individual may incorporate errors into their cognitive representation [127].Additionally, observational learning is thought to reduce cognitive load compared to 'learning by doing', especially in novices, allowing a learner to direct more attentional resources to concentrate on understanding the steps and strategy, rather than allocating resources to completing the task [127].This in turn contributes to more efective learning [77,112], long-term retention, and transfer [113].Nevertheless, even with observational learning, practice is essential for the latter associative and autonomous phases of learning a psychomotor skill [29], and training for realworld tasks has shown that observational learning without practice is inferior to active learning whilst observational learning with practice produces comparable outcomes [112].Therefore, we investigate observational learning combined with 'hands-on' practice.

Efect of Avatar Similarity on Learning
The appearance of the model can impact the success of observational learning [10], with models that are more closely aligned to the user further enhancing skill acquisition [33,52,55,118].Feedforward learning techniques extend this to include the 'self' as a model, providing a preview of what the individual could achieve in the future [23,24], and has predominantly been implemented using video.Video self-modelling interventions have shown promise for enhancing skills that contain manual tasks.In a cup-stacking task, video self-modelling enhanced skill acquisition over practice alone when the viewing perspective required the least mental rotation [43].Contradicting these fndings, performance in a Legobuilding task did not difer signifcantly whether participants were exposed to a video containing a self-model, another person, or physical practice alone [68].However, all participants had access to the instruction manual during the assembly trials which may have hidden any possible efects of the modelling technique.Other research has provided evidence in favour of using video self-modelling for manual tasks, such as enhancing self-efcacy in playing a musical instrument [79] and increasing video game skill [60].In this study we explore whether the same efects can be elicited by increasing avatar similarity during VR training of a manual task.
Prior work has shown that the design of avatars used in educational virtual environments is an important consideration because it impacts learning processes including performance, intrinsic motivation, and cognitive load [21,93].When taking an observational learning approach to learning manual tasks in VR, the form of the demonstrator can vary, ranging from simply a model of a hand to a full-bodied avatar [92].In this paper, we use full-bodied avatars as demonstrators to mimic a real-world master-apprentice model.The visual representation of such avatars (i.e.body shape, facial features, gender, ethnicity, age, and clothing [36]) should be appropriate considering the context of the learning content/topic, the learner's demographics, and the surrounding virtual environment [121].For example, an avatar dressed as a scientist would be more appropriate for learning science than art [120].Learner demographics can also afect how an avatar is perceived and how it afects the learner, such as a female scientist avatar enhancing learning for female students, possibly due to greater identifcation with the avatar [72].
Increasing the similarity between the appearance of a pedagogical agent and the user has been shown to produce greater engagement [65], self-efcacy [52], and performance [55,103] in virtual education environments.The efects on motivation, performance, and self-efcacy are often greatest when perceived avatar similarity is highest and deteriorate when the avatar used becomes dissimilar, e.g. an idealised avatar [58,103].The closest related work provides evidence that increasing avatar similarity via customisation of gender, skin tone, and hair colour [27] or full customisation [33] improves training over the use of less similar avatars in the context of gross psychomotor skills.To see whether the same efects are observed in fne psychomotor skills training, we explore the consequences of using dissimilar, similar, and self avatars.

Measuring Learning Efects
A psychomotor skill is said to be learned when an individual can perform the action sequence efciently, with minimal errors and increasing speed and accuracy (Cratty, n.d.; cited by [114], p.13).Within VR training contexts, the skill should also be repeatable in the real-world [50,66]-this is often referred to as skill 'transfer'.Transfer can be considered 'near' if it involves unchanged repetition of the acquired skill and is therefore highly relevant for 'closed' psychomotor skills which involve repeating the same task without any variation, such as hand-washing by a medical professional [78,86,97].Near transfer, for example repeating the same task in the real world, is often used as an indicator of the efectiveness of VR training [14,34,44,82,92]; however far transfer, which requires the learner to adapt what they have learned (e.g. to a new task or environment; [39,78]), is often not tested.Despite many VR training environments ofering a very close match or digital twin of the real-world task [2,125], real-world environments can be changeable and there are very few closed psychomotor skills in industry.For example, the parts and tools are often laid out carefully in a particular order/position and possible distractions are rarely simulated [2], but this is unlikely to be matched in the real world.Therefore, in this study, we perform both near and far transfer tests in the real world to see whether the learning is robust enough for the skill to be repeatable when the components are in an unfamiliar position, are not coloured, and whilst having to pay attention to auditory information in the environment.
The time between being trained in VR and utilising the skill is another variable that is important to consider.Depending on the context, many situations might require an employee to apply skills that they have been taught but not recently carried out [38,61] and skill acquisition can be erratic.Immediate efects are not always evident [124] and even when they are may lack permanence [101].Therefore, in this study we also consider performance immediately following the training and again after a 10 -14 day delay.

INDUSTRY INTERVIEWS
Fine psychomotor skills, such as assembly tasks, are among the most commonly evaluated in the VR training literature [1].Prior work has emphasised the importance of these skills for industrial use cases, and the number of VR training applications for training workers continues to grow [37,101,129].However, the instructional strategies and how they are implemented in VR training are rarely discussed [1].To provide additional evidence for the use of VR training in this context and uncover how this type of training is typically structured, semi-structured interviews were conducted with industry stakeholders.We use the fndings to motivate the methodology and inform the development of the VR training employed in our study.

Participants
We recruited participants who have experience designing, developing, delivering, or using workplace VR training via adverts posted on LinkedIn and Twitter.We interviewed 22 individuals (CEOs, Founders, Consultants, Developers, Producers, Managers, Directors, Vice Presidents) from a range of companies (Consulting, R&D, Marketing, Enterprise VR), services (Policing, Fire), and institutes.

Procedure
Anyone with relevant experience who expressed interest was given an information sheet to read before proceeding to sign up to participate.Each semi-structured interview was conducted via video call and lasted approximately 20-30 minutes.The interviewer reminded participants of the information sheet before gaining consent for the interview to be recorded.The interviews were focused around four main questions, frst exploring what tasks are being simulated and taught using VR in industrial contexts ("What do you/the company you work for use VR training for?", "What types of tasks?Could you walk me through an example?").After establishing the types of tasks that are trained using VR, using the given examples of any tasks containing psychomotor elements, the interviewer probed into understanding how this training was approached, what actions the learner performs in the virtual environment and how they interact in VR ("Within the examples of VR training you are familiar with, what kinds of actions are involved?By this I mean how does the user interact?What types of actions do they have to perform during the training?").Finally, the goals of the training and how/whether these are assessed were discussed with a particular focus on the metrics used.At the end of the interview, the experimenter thanked the participant for their time and ofered them the opportunity to ask any follow-up questions they might have.All interviews were recorded, auto-transcribed and corrected.Participants were allowed to share their screens and utilise the chat function to share resources, videos, and images to demonstrate the virtual training that they were referring to.

Summary of Findings
A refexive thematic analysis was used to analyse the interview transcripts and generate overarching themes.A data-driven inductive coding process was used to identify a number of codes that were grouped into sub-themes under the overarching themes [13].
Active Learning is the Dominant Approach.Training of industrial procedures, that fall under the psychomotor domain, was unanimously achieved through 'learning-by-doing', i.e. active learning.The learner would gain hands-on experience of doing the procedure ("it's completely active learning right, it's completely learning by doing").Within the VR environment, this would involve the learner having a frst-person perspective ("so all these use cases, you are frst person, you're immersed in the experience"), so that they were in a position to do the tasks themselves ("gives you the ability to interact and do the task yourself, which actually enhances the training quotient of VR training") and physically practise allowing them to gain muscle memory for the procedure ("get a feel of actually carrying out these activities themselves, which you know becomes a muscle memory").
Users are Guided to Perform the Correct Actions.The learner is guided to complete the actions in the correct order ("it pretty much told them exactly what to do"; "putting someone in a room and taking them through a sequence of tasks").This is usually achieved using a mixture of visual ("highlight the object"; "it will fash this kind of guided-mode like rotate the engine block to 180"), audio ("voice-based guidance"), and text cues ("usually a text box").Sometimes a demonstration is incorporated where the learner observes how to complete the procedure before having any hands-on experience ("Some client requires they would like to have it through a third person-based demonstration").This could be in the form of a virtual tutor ("when they put on the helmet there can be a virtual trainer that teach them how to perform the procedures safely") or a 'ghost' trainer ("within that VR environment have like a ghost trainer sort of thing who's doing it, and you have to follow them so they make steps and you have to go alongside and follow").
Users Practise their Skills Virtually.In a separate practise mode, the learner receives much less or no guidance cues so that they can attempt the procedure from memory ("they just have to remember what they have to do, I think that is quite useful as a progression to actually train someone").If the learner makes a mistake, they will receive feedback to notify them of their error ("the screen goes red so you know you're making a mistake") and to help them to learn the correct procedure ("we always provide immediate feedback, because the goal is to teach the user the procedures").
Knowledge Retention and Transfer is Important and Should be Measured.During an assessment mode the learner's actions are monitored to give an indication of their profciency for completing the procedure ("monitor the tasks of the user"; "If you took all the steps it's checked, if the exercise is completed successfully or with errors").Performance measures collected whilst the learner completes the task include the time taken, the number of attempts, and errors ("did they do it efectively. . . in the right sequence, in the right time"; "tracking their number of attempts").Learners are often observed or recorded during this mode so that their performance can be reviewed later.Retention of the steps can also be assessed in the form of a reassessment after completing all training modes ("an individualised assessment after the training to the user").
Transfer activities are also utilised in some instances to further measure a learner's understanding of a skill.This typically involves the learner completing a separate task related to the one they have been learning but with slight diferences so that they must demonstrate and use their understanding rather than repeat the exact memorised procedure ("There was a lot more, there's more ambiguity, so it wasn't necessarily like an SOP [Standard Operating Procedure]. . .they would follow, they would have to actually solve a problem").This is gearing learners towards transferring their skills to the real world, where scenarios are less rigid and there may be unknown factors ("even though they know how to operate a piece of equipment, there's always going to be a level of ambiguity.That was a really important piece for them actually transferring it to the real world"), requiring the use of more open psychomotor skills.

Design Considerations
Our fndings show active training is very common in industrial VR training applications.In contrast, observing a demonstrator interaction with the objects.Guidance on how to perform the procedure should be removed to test their memory but they should be given feedback when they make a mistake.DC5 In the assessment phase, the user's performance should be recorded on the same task.

METHODOLOGY
We conducted a user study to compare immediate and longer-term training outcomes using active learning compared to observing a demonstrator in VR (RQ1), and how this afects transfer of the acquired skill (RQ2).We also explore whether there are any efects of demonstrator avatar similarity on observational learning of a fne psychomotor skill (RQ3).In a between-subject design, participants learned how to assemble a "Burr puzzle" -a 3D interlocking puzzle -using either an active or observational learning approach.For the observational approach, we also explore the efects of demonstrators having either a dissimilar, matched feature, or self-similar appearance to the user.The allocation of participants to a condition was carefully managed to ensure similar prior experience with VR, mental rotation abilities and baseline movement imagery abilities across the groups.The study received ethical approval from a Research Ethics Committee.

Burr Puzzle
Participants were tasked with learning to assemble a 6-piece interlocking 3D puzzle, known as a Burr puzzle.We chose this task because it has been used in prior research into VR training of manual tasks and represents a fne psychomotor skill with procedural elements [14,82].The puzzle was designed using BurrTools 0.6.0 and the precise confguration was selected on the basis of containing 6 unique pieces and having only one solution which could be assembled in a 5-step procedure (see Figure 3).This was chosen so that the puzzle difculty made it challenging to learn the assembly [17].We only focus on training one confguration because recalling multiple would increase the difculty and could overload participants [82].The Burr puzzle pieces and assembly were modelled for use in VR and the pieces were 3D printed for real-world transfer tasks.The virtual pieces snap together when they are held in the correct position, as is common in VR assembly tasks [19,82,125].

Avatar Creation
Avatars were constructed using Reallusion Character Creator 4 1 .Self-similar avatar clothing, body shape, hair colour and style were customised to resemble each participant (see Figure 2).The Headshot plugin was used to generate a face for the avatar based on a photograph of the participant.Avatars used for the dissimilar and matched-feature conditions were given a generic uncustomised male or female body shape and the Headshot plugin was used to generate a face for the avatar based on an AI-generated photograph created using Generated Photos 2 (see Figure 2).Matched-feature avatars had the hair colour, skin tone, gender, and age (young adult, old adult) that the participant identifed with the most.Dissimilar avatars had a contrasting skin tone and hair colour, and the gender and age the participant identifed with the least (see Figure 2).

Virtual and Real Environments
All virtual environments were created using Godot 3.5 and consisted of a room which contained a table and a chair which participants were seated at throughout.For all conditions, participants embodied their self-similar avatar.Inverse kinematics was used to control the movements of the avatar's arms based on the controller position, and a grip animation was played when the trigger button on the controller was held.The VR training was composed of separate Training, Practice, and Assessment phases (DC2), which are described below.

Familiarisation Scene.
A virtual mirror, some 3D objects, and text instructions were added to the environment to familiarise participants with the controllers and their self-similar avatar (see Figure 4a).The experimenter verbally instructed participants to complete the familiarisation procedure which taught them how to interact with virtual objects, including how to pick up and put down objects, what red and green highlighting indicates, how to assemble two pieces, what happens if they drop an object, and how to pass objects between their hands.Participants could see themselves represented as their self-similar avatar in a virtual mirror throughout  4c).During the scene, they were given a frst person perspective and could interact with the objects directly (DC1).Each step of the assembly was guided by a text instruction and an animation prompt displaying how to connect the next piece (DC3) [119,126].These would progress automatically once the participant had completed the current step being shown.The training scene ended once participants completed all 5 steps or a timer node ended the scene after 5 minutes.

Observational Training Scene.
Participants in the observational learning conditions were given text and audio instructions describing what they would experience in the observational training scene (see Figure 4b).In this scene participants were visually shown how to complete each step of the task by observing a third-person perspective demonstration (DC3).A professional motion capture studio was used to record an expert assembly of the physical puzzle which was converted into an animation demonstrable by the avatar and virtual pieces.The duration of the assembly was 30 seconds and it was played twice each time.Mimicking pairwise training, whereby people work together to observe and then complete the puzzle [112], and to provide a more efective viewing angle [43], the participant sat next to the demonstrator avatar and observed how to complete the assembly task (see Figure 4b).

Practice and Assessment
Scenes.We created a virtual environment in which participants could assemble the puzzle without guidance to allow them to practice and test their learning (DC4).3D-modelled puzzle pieces were positioned on the table for participants to interact with and assemble.If pieces were dropped on the foor, they would reappear in their position on the table.Some feedback was provided to participants in the form of object highlighting, indicating whether a piece can (green) or cannot (red) be snapped together (DC4; see Figure 4d).The assessment scene was used for both the baseline and retention tests, however, object highlighting was disabled so that participants did not have help with the assembly task (see Figure 4e).In this scene the participant's performance was recorded (DC5).

Real World Transfer Assessment.
To test transfer to the real world two versions of the puzzle were 3D printed at the same scale.
The near transfer test used pieces that had the same colour coding as the virtual pieces and a paper template ensured that the pieces were laid out in the same order and orientation (see Figure 5a).The far transfer task involves a dual task with participants completing the same puzzle, but using non-colour coded 3D printed blocks arranged in a diferent orientation (see Figure 5b), whilst simultaneously 'tone counting' audio beeps played at random intervals to increase the overall level of difculty.
(b) (a)  We used Valve Index controllers to allow robust and comfortable interaction.A GoPro Hero 11 was used to record the real-world assembly tasks for analysis purposes.

Outcome Variables
The primary outcome measures in this study are performance in the retention (in VR) and transfer (near and far) tests, indicated by the number of pieces assembled correctly and the time (seconds) to complete the puzzle assembly.A participant succeeds if they assemble all 6 pieces within 180 seconds.For the far transfer test, the number of tones identifed was also measured.
Secondary outcome measures included movement imagery to further indicate encoding of the procedure and skill in long-term memory [25,54].Baseline imagery ability was assessed using the revised vividness of movement imagery questionnaire (VMIQ-2) [105] that asks people to rate on a 5-point Likert scale how well they can imagine performing each action (1 = 'perfectly clear and vivid as normal vision', 5 = 'no image at all, you only know that you are thinking of the skill') from their own perspective to measure internal visual imagery (IVI); someone else's perspective to measure external visual imagery (EVI); and the feeling of doing the actions to measure kinaesthetic visual imagery (KVI).To measure imagery of the puzzle assembly, we replaced the generic items (e.g., 'Bending to pick up a coin') with task-specifc items (e.g., 'Manipulating and orienting the fnal piece into position') [74].Scores for each subscale are calculated by summing each rating and dividing by the number of items, with lower scores indicating more vivid imagery.
Perceived competence and intrinsic motivation were measured using the Perceived Competence (PC) and interest/enjoyment (I/E) subscales of the Intrinsic Motivation Inventory (IMI) [75].Selfefcacy is assessed with a task-specifc questionnaire developed according to Bandura's guidelines [9] that measures the strength of an individual's confdence (0 -100) in their ability to execute increasingly difcult activities (e.g., assembling 2/6 up to 6/6 pieces).Self-efcacy is calculated by summing all certainty scores and dividing by fve as the number of performance standards.
Other measures include potential covariates such as prior VR experience, preferred learning style, mental rotation ability, and mental efort.Prior VR experience was measured using a single item rating scale ranging from 0 ('Never used VR before') to 4 ('I use VR often and have developed my own environments in VR').The Learning Style Scale (LSI) [106] was used to assess individuals' preferences towards concreteness versus abstractness (ACCE; 7 items; e.g., 'I like to be specifc' --'I like to remain fexible') and refection versus action (AERO; 7 items; e.g., 'I value patience' --'I value getting things done') on 6-point bipolar scales.High scores emphasise preferences toward abstract conceptualisation and active experimentation.The Revised Purdue Spatial Visualization Tests: Visualization of Rotations (The Revised PSVT:R) [130]) was used to assess mental rotation ability, containing 30 questions in which an individual is asked to mentally rotate 3D objects.Participants select an answer from fve options, and their score is given by the number of correct answers.Mental Efort was measured using the simulation task load index (SIM-TLX) [41], which is a measure developed for workload demands placed on users in simulated environments such as VR.Participants rate 9 dimensions on 21-point Likert scales: mental demands, physical demands, temporal demands, frustration, task complexity, distraction, perceptual strain, and task control.An additional 5-point Likert scale was used to indicate the usefulness of the training environment.Finally, as presence has been shown to interact with learning outcomes in virtual environments [91,110] the Multimodal Presence Scale (MPS) [71] was used to measure feelings of physical (5 items; e.g., 'While I was in the virtual environment, I had a sense of being there'), social (5 items; e.g., 'I felt like I was in the presence of another person in the virtual environment'), and self (5 items; e.g., 'I felt like my virtual embodiment was an extension of my real body within the virtual environment') presence scored on a 5-point Likert scale.

Procedure
At the point of recruitment, participants completed an online screening questionnaire.Anyone failing to meet the inclusion criteria was automatically told they were ineligible to take part, otherwise, individuals were directed to sign up.This study was conducted over three sessions.4.6.1 Session One.Participants completed demographic, prior VR experience, learning style, imagery and mental rotation ability questionnaires and had their photographs taken for the self-similar avatar creation.The experimenter used the questionnaire responses to allocate participants to a condition (active, dissimilar, minimal, or self-avatar) and created the avatars.4.6.2Session Two.Participants completed the familiarisation task to get used to the VR environment and controls, they were allowed to ask the experimenter for assistance if they needed it.After completing this they were instructed to remove the headset and complete the avatar identifcation measures.They were then introduced to the Burr puzzle target shape and were given a maximum of 180 seconds to assemble the puzzle (Baseline test).
The trials then commenced, which involved two parts: training and practice.In the training phase participants were either guided how to complete the puzzle with text prompts and animations (active learning) or watched the demonstrator avatar complete the Burr puzzle (observation learning).In the practice phase, all participants were given a maximum of 180 seconds to assemble the puzzle, and we recorded the number of pieces assembled correctly and the time taken.We operationalised the trials in this way because practice is essential for the associative and autonomous phases of learning a psychomotor skill [29].We included practice in the observational conditions because observational learning without practice is inferior, while observational learning with practice has been shown to be comparable to active learning [112].
The training was repeated for a total of 40 minutes up to a maximum of 10 trials.Afterwards, participants completed the questionnaire measures, and the immediate retention, near, and far transfer tests.We then conducted an interview with participants and asked the following questions to gain qualitative feedback: 'Could you please summarise how you found the virtual training experience?', 'Could you describe your approach/strategy when trying to learn to assemble the puzzle?','Could you please tell me how you found observing the demonstrator avatar?[Observation conditions only]', 'How did you feel about your own avatar, the avatar that you embodied in the virtual environment?','How about the transfer of skills from VR to real world -did you fnd that the training helped?', 'Could you envisage using this type of virtual reality training again in the future?','Is there anything else you would like to comment on or discuss relating to the VR training experience or the instructor avatar?'.Interviews were recorded and later transcribed.

Session Three.
Participants returned after a 10-14 day delay to complete imagery and self-efcacy questionnaires, and the delayed retention, near, and far transfer tests.Afterwards, participants were debriefed and reimbursed £15 for their time.

Hypotheses
We expect virtual training will improve puzzle assembly skills; however, prior work suggests there will be signifcant skill decay after 10-14 days [82].Observational learning theories indicate that model similarity can enhance learning [7,20,23,67] and prior research has shown that observing avatars which are either minimally similar to users [27] or photo-realistic self-avatars [33] can provide a feedforward efect which improves learning and therefore task performance.We hypothesise that: H1: Performance in VR Retention (H1a), Near Transfer (H1b), and Far Transfer (H1c) will be worse after a 10-14 day delay compared to immediate testing.H2: Performance in VR Retention (H2a), Near Transfer (H2b), and Far Transfer (H2c) following observational learning with self avatars will be better than dissimilar avatars (RQ3).H3: Performance in VR Retention (H3a), Near Transfer (H3b), and Far Transfer (H3c) following observational learning with minimal avatars will be better than dissimilar avatars (RQ3).
We do not have hypotheses for the comparison between learning techniques (RQ1 & RQ2) because using observational learning for fne psychomotor skills is under explored in VR training and to our knowledge we are the frst to directly compare active instruction to observing an avatar demonstrator in VR.

Participants
A sample of 102 participants (55M, 47F), aged 17 -63 ( = 31.3473,= 11.057),recruited through mailing lists, social media, and posters, completed session one and session two.Of these 99 returned to complete the third session, however 6 returned outside of the 10 -14 day window due to illness or holidays.All participants were screened prior to taking part to ensure they were aged 16 or over, had normal or corrected to normal hearing and vision, displayed no sign of colour blindness, did not have any movementrelated conditions, and did not have extensive experience in completing Burr puzzles.The Ishihara test for colour defciency [30,49] was used in which participants must identify the number or presence of lines in 38 pseudoisochromatic plates and anyone deemed to have colour vision defciency was screened out.Participants were asked to rate their familiarity with Burr puzzles (0 = 'Never heard of it' -3 = 'I have solved many Burr puzzles'), and anyone scoring 3 was also screened out and excluded from the study.

RESULTS
To assess the efectiveness of the diferent learning approaches we analysed the success rate and number of pieces assembled.Tests of normality revealed the data for both success rate and number of pieces assembled was non-normal, therefore where appropriate we report the median and interquartile range as descriptive statistics.To compare active learning to observational learning (RQ1 and RQ2) we analyse the success using binomial generalised linear mixedefects models and the number of pieces assembled using repeated measures proportional ordinal logistic regression across both immediate and delayed tasks.Assumptions for binomial generalised linear mixed-efects models were validated using simulation-based dispersion tests using the DHARMa R package and visual inspection of Q-Q plots.Assumptions for proportional ordinal logistic regression were validated using the test of proportional odds.To explore immediate and delayed diferences between conditions we conduct Wilcoxon rank-sum tests.For RQ1 and RQ2 we compare active against each of the observational conditions, and for RQ3 we perform all pairwise comparisons between the observational conditions.All post hoc tests are corrected using the Holm-Bonferonni method to account for multiple comparisons.All data, R scripts and detailed results are available in Supplementary material.Descriptive statistics for the questionnaire measures, the number of people who succeeded, and the number of pieces assembled, are available in Table 1, Table 2 and Table 3, respectively.

Manipulation Checks
One-way ANOVAs were conducted to assess the balancing of the groups in terms of their existing abilities and preferred learning styles.There were no signifcant diferences between the groups for number of pieces assembled in the baseline test, general imagery abilities (EVI, IVI, and KVI), and ACCE, AERO, and PSVT:R scores ( Prior VR experience was on average none to minimal, however, there was a signifcant diference between the groups ( (3) = 11.181,= .011, 2 = 0.118).The minimal group had signifcantly less exposure to VR than the dissimilar ( dif = 0.509, = .023)and self groups ( dif = 0.469, = .038)before taking part.There was no signifcant diference between the active and any of the observational groups ( ≥ .098).Visual inspection of scatterplots revealed no apparent relationship between prior VR experience and performance on the retention and transfer tests.There was no signifcant correlation between prior VR experience and number of pieces assembled ( (102) ≥ .016,≥ .291)or time to complete ( (102) ≥ .013,≥ .201) in any of the tests.
Our manipulation of demonstrator avatar similarity worked as intended, self-avatars were perceived as being the most similar ( = 5.760, = 0.
A similar binomial logistic regression indicated that there was no signifcant change in the odds of succeeding in the far transfer test with delayed testing in the active condition and there were no signifcant interactions between the observation conditions and delayed testing compared to the skill decay experienced in the active condition ( ≤ 1.480, (192) ≤ |0.313|, ≥ 0.347, = 1.000, ≤ |0.524|).Further analyses revealed no signifcant interactions between the observation conditions and the time of testing (immediate/delayed) relative to each other ( ≤ 1.364, (142) ≤ |0.285|, ≥ 0.776, = 1.000, ≤ |0.175|).Overall, there were no signifcant diferences between the likelihood of succeeding in the immediate far transfer test compared with the delayed far transfer test.This fnding was consistent in all conditions, therefore we reject H1c.

VR Retention.
To test overall diferences in the odds of succeeding in the VR retention tests in the active versus observational conditions a binomial logistic regression was run.There were no signifcant diferences in overall retention success rates between active and observation conditions ( ≤ 1.312, (196) ≤ 0.651, ≥ 0.515, = 1.000, ≤ |0.150|).Similarly, an ordinal logistic regression on the number of pieces assembled in the VR retention tests revealed no signifcant diferences between the active and observation conditions indicating similar performance in the VR puzzle assembly task whether active or observational learning was used ( ≤ 1.208, , |0.614|, ≥ 0.539, = 1.000, ≤ |0.135|).
To further investigate possible diferences in performance between the active and observational conditions pairwise Wilcoxon tests were conducted.These revealed no signifcant diferences in the number of pieces assembled in the immediate and delayed retention tests, nor in the completion time between the conditions on the immediate and delayed retention tests ( ≤ |445.000|,≥ .051,≥ .154).

Real World Near
Transfer.Binomial logistic regression for the near transfer tests revealed no signifcant diferences in the odds of succeeding between the active and observational conditions indicating similar levels of overall success in the real world near transfer task ( ≤ 2.518, (196) ≤ 1.758, ≥ 0.079, ℎ ≥ 0.236, ≤ |0.509|).An ordinal logistic regression was used to test for any differences in the number of pieces assembled in the near transfer tests, revealing no signifcant diference with the number of pieces assembled using observation compared with active learning conditions ( ≤ 2.453, ≤ |2.393|, ≥ 0.017, ≥ 0.05, ≤ |0.495|).To further confrm this, Wilcoxon tests revealed no signifcant diferences in the number of pieces assembled or completion time following active and observational learning both in the immediate and delayed near transfer tests ( ≤ |458.000|,≥ .024,≥ .072).An ordinal linear regression also revealed signifcant diferences in the number of pieces assembled in the far transfer tests between active and all observational conditions.There were signifcant positive efects of all observational learning conditions compared to active learning (Active v Dissimilar:

Imagery.
We used two-way mixed ANCOVAs to test the efect of delayed testing relative to immediate across training conditions on task-specifc internal (IVI), external (EVI), and kinaesthetic visual imagery (KVI), whilst controlling for the respective general visual imagery ability as a covariate.
No main efect of condition or interaction efects were found.Pairwise comparisons also found no signifcant diferences comparing between the active and observational conditions.main or interaction efects were signifcant.A one-way ANOVA revealed that there was no signifcant efect of condition on perceived competence at assembling the puzzle following the training ( (3, 98) = 1.537, = .210, 2 = 0.045).A separate oneway ANOVA indicated that there was a signifcant main efect of condition on interest/enjoyment ( (3, 98) = 2.819, = .043* , 2 = 0.079), however, Holm corrected post hoc tests were unable to detect any signifcant pairwise diferences ( ≥ .055).A series of one-way ANOVA's revealed that there was no signifcant main efect of condition on any dimensions of the SIM-TLX, nor were there any signifcant diference in the reported usefulness of the diferent training conditions.Additionally, one-way ANOVAs revealed no signifcant main efect of condition on presence.

Efects of Demonstrator Similarity
To test whether demonstrator similarity afects observational learning a series of tests were run only comparing dissimilar, minimal, and self observation conditions.Therefore, we reject H2c and H3c.

Qualitative Results
An inductive coding process as part of a refexive thematic analysis was conducted to gain further insights into VR training.The overarching themes are discussed using participant quotes as illustrative examples; text in square brackets is used to add context to a quote to make it easier to understand.
VR Training was widely regarded as being efective for learning how to assemble the puzzle ("It was very efective, I didn't know how to solve it in advance and now I can reasonably quickly"; "I just did that quickly, there's no way I would have done that before at all. . .so yeah really really efective"), despite the complexity of the task ("The task initially seemed quite daunting, but it was a very useful way of going through it."; "to begin with it was hard to do the task without the avatar, but as it went on, I saw the person do it in front of me it became clear").
Users were able to execute the skill in the real world ("My performance in VR accurately refects how I performed in real life.I basically did the same thing here and there."),which came as a surprise to some participants ("I think it was transfer surprisingly transferable actually, I essentially follow the same process that used in in VR").Although the far transfer task was deemed the hardest: I think that when the pieces were put out not in the same orientation it I struggled to identify which way I was expecting to see them combined with the colour; when I did the the one without colours that was a lot harder; It was the sounds causing the stress Some aspects of the VR training mechanics limited the transferability to the real world ("I don't think they transferred 100%"; "There was a couple of bits and pieces that weren't as easy to transfer").Mainly snapping and object rigidity: everything kind of locked into place when it got right again, something that was massively helpful that didn't mean anything in the real world; you could sort of force them in VR to just go together, whereas you could obviously can't do that; I think that the challenges that we had in the virtual world are diferent for the one that we had in the real world so somethings I could manage to transfer from the virtual world to the real world.
But some things that make our life easier in the virtual world we don't have in the real world The pieces not 'sticking' in the real world posed a challenge, the pieces could slide apart ("when you put them [Assembled VR pieces] down they stayed still, which I found when doing the real task was not the case.Yeah, bits would fall out"), so more efort was required to assemble the puzzle ("having to support them or something, to wiggle them a little bit to come into place.But I think that was the only complication."),and the pieces not locking together meant participants were unsure if they were correct ("I felt like I was relying a bit on how it auto stuck them together which I realised when I started doing this [transfer tests]"; "once I put the pieces together in the the virtual environment they remains together in the real environment um I could change them").Having to handle the rigid pieces in the real-world task also proved difcult ("The fact that I could kind of click things through other things [in VR] certainly helped a lot which obviously doesn't transfer outside").
Improvements in the realism of the training were suggested to allow users to better prepare for the real-world task.Despite the controllers providing an adequate proxy ("I think the controllers work.I think if it's simple then the controllers work.The more complicated it would get perhaps I need more developed type of control."),having full fnger dexterity would be valued for learning more complex fne psychomotor tasks in VR ("if I could move all my fngers [in VR] it will help me to do the tasks more easily"; "you could see that the instructor, like using fnger by fnger.I was like yeah, wish we could do that.").
Observing was efective for learning to imitate the actions: very efective. . .Yeah I've watched him do it, and then I could do it.I don't know how that to explain that it just it clicks in a certain way; I would just imitate the rotation of each one; just copying it.It was.It was an efective way to do it defnitely But there was a desire to have greater control over the watching component of the training e.g., their viewing perspective, pausing, and choosing when to observe: I felt like if I was like at more of a almost like on top of him angle that might have been better; sometimes not seeing it from my perspective was kind of annoying; I imagine that it would be a lot more efective if you could pause; there should be something like I can pause that for some time and then I just have a look; maybe the chance to go back, to the instruction if we need; Sometimes it felt like it was too long, and sometimes it felt like it wasn't long enough; after the 6th time or so you don't really need to see the tutorial again There were some advantages of using an avatar over a real person such as feeling more comfortable ("it's just an avatar so I didn't feel judged"), having fewer distractions ("an avatar ... is less distracting...so it helps to focus on the task"), and having a consistent demonstration to follow ("it was good seeing the same thing over and over again"; "Obviously what he did was exactly the same every time whereas it might be slightly diferent if it's an actual person.").Otherwise, observing the demonstrator avatar was akin to watching a real person, the movements were realistic ("It felt fairly realistic, like watching someone in in terms of the movements."; "The movements were very smooth from the Avatar and it was very easy to follow what the Avatar was doing").However, the lack of communication was noted: if I have the chance to ask questions from the person, yeah, I would prefer to have a person because besides looking, I could make some questions.; so I couldn't be like oh stop.I want to see exactly, like turn it around.I want to see exactly what you're doing, or can you do that again?It's just she did it Self-avatars present some disadvantages to learning.Most participants felt that their self-avatar resembled them ("it was pretty accurate to my real appearance, I didn't expect it to have like the same outft, the same like hair, face, It's pretty cool").Some liked having a self-avatar ("I think I related more with the whole experience just because I have an avatar similar to me"; "Nice to have an avatar that similar to me.") but viewing a self-avatar was not always positiveevoking uncanny valley efects ("I thought it something just looked kind of strange"; "I was both kind of repulsed and amazed at the same time") and became a distraction: it was quite creepy, I probably wish it didn't resemble me.because again... sort of. . .I would almost judge myself against that: It should be me doing this better than I do!; In the beginning it was distracting cause I was trying to like compare between myself and the avatar.Yeah.And then towards the end I was like I actually want to complete it; I think I was just shocked cause like ohh that's me For others, the novelty of having a self-avatar wore of quickly ("I didn't really care after the frst time that it looked like me, so it was pretty normal").Comparatively, minimal and dissimilar avatars were not distracting: I can't remember the [minimal] instructor particularly well, I think I was mainly focused on the on the actual blocks rather than the instructor; I wasn't really concentrating on the [dissimilar] instructor themselves to be honest I was just very focused on the on the puzzle.So yeah, it could have been anything, anyone or anything sitting there.

DISCUSSION
Our fndings demonstrate that VR training was successful for learning an assembly task with the majority (77-89%) of participants able to complete the puzzle in the VR retention test after the training.This is in keeping with prior work which has shown the efectiveness of VR training for acquiring fne psychomotor skills [34,82,125], and participants across all conditions commented on the efectiveness of training virtually.Whilst existing VR training applications almost exclusively use active learning [2,4,11,88,95,99,125] which is generally considered more efective than observational [112], our fndings indicate that observational learning combined with practice is highly efective as a learning approach in VR.All participants assembled the puzzle in the practice phases, but those in the active learning condition received twice as much 'hands-on' experience because they also assembled the puzzle during the training phases.In contrast, participants in the observational learning conditions observed the puzzle being assembled during the training phases.Despite this, we were unable to detect signifcant diferences between the active and observational conditions (RQ1).Our analysis reveals that, with 95% confdence, any diferences in overall success rate for the retention task in VR would result in a medium efect size at most.These similarities apply not only to the performance during the retention task, but also to user experience as measured by enjoyment, perceived competence, presence felt, and physical and mental workloads placed on the users.
Similarly, we found no signifcant diferences between active and observational learning for transferring skills to the real world when the task is exactly the same.While this does not necessarily mean they are equivalent, it does show that they are both efective for acquiring fne psychomotor skills in VR, in line with related work suggesting a degree of functional equivalence between action and observation [53,80,94,119].However, we provide evidence to show observational learning signifcantly improves the ability of users to transfer their learning to real-world tasks beyond the context learned in VR compared with active learning (RQ2).The odds of succeeding in the far transfer tests were between 6.6 and 7.9 times higher for those in the observational learning condition compared to those in the active condition, with a large efect size for each of the individual observation conditions.We also observed this efect was immediate with two of the observational conditions (dissimilar and minimal) compared with active.This phenomenon has also been demonstrated outside of VR training, indicating that observing contributes to learning in a way that allows the individual to apply their skills more easily to variations of a task [85,109,112].This has important implications for VR training because operating in an unchanging environment is extremely rare, and real-world transfer is important in the majority of tasks where VR training is already deployed, or will likely be deployed in the future [14,69,82,92].This also has important methodological implications for VR training research because most prior VR training studies only focus on measuring near transfer as an indicator of success [2,14,34,44,82,92].Our results demonstrate that far transfer success is much lower than near transfer across all conditions, yet is more applicable to industry where these techniques will be deployed.Therefore, integrating and studying far transfer as part of VR training research methodology should be prioritised.
The far transfer benefts of observational learning may be explained by the cognitive processes that occur during learning [112].This can be seen by the fact a dissimilar avatar leads to signifcantly stronger external visual imagery for the task than active learning, which indicates that there are diferences in the immediate cognitive efects of the learning processes which are strongly correlated with performance [32,54,76].Additionally, attentional resources available during learning are likely to play a role in explaining these diferences.In the beginning, when the task is unfamiliar, observational approaches allow users to direct their attention to the requirements of the task and focus on cognitively understanding the complex nature of the puzzle [127].In contrast, active learning is more likely to direct cognitive resources into physically completing the task which may afect their cognitive understanding of the procedure as a whole.Additionally, the demonstrator avatar consistently shows participants an efective series of actions for assembling the puzzle in the observational conditions, which could result in users adopting the same strategy early on in the learning process [112].This avoids issues that can occur with active learning where it is more likely that users develop unhelpful techniques before developing an efective refned strategy [8,15,127].
In line with prior work our results show signifcant skill decay across all conditions after a 10 -14 day delay between training and performing the tests [82].Only 28-39% of participants were able to completely assemble the puzzle in the delayed VR retention task, representing a nearly 50% drop in success rates, and signifcant decay was also observed in the near-transfer tasks.The signifcant deterioration in participants' ability to imagine completing the task from an external, internal, and kinaesthetic perspective after the delay demonstrates that they forget how to perform the skill.This reiterates the importance of repeating training and not allowing long periods of time to pass between being trained virtually and utilising the skill [38,61].However, there was no signifcant decay in far transfer skills.This is likely due to the overall poor performance in the far transfer tests immediately following the training.For example, only 15% of those trained in the active condition were able to successfully assemble the same puzzle when the colour-coding, set orientation, and distraction-free environment were removed.
The observational learning condition included a hands-on practice component so we cannot draw conclusions about how much of the skill was gained purely from observation.Prior research suggests that observation without practice would be inferior to active training [112] as gaining 'hands-on' experience is an essential part of learning psychomotor skills [29].Including practice in the observational group has likely increased the learning gains; however, any diferences observed are likely due to the active versus observational training approach because the practice phases were the same for both conditions.
We observed high variability in performance across the participants and therefore, within the groups.One potential explanation is that most participants had little to no prior VR experience, which has been known to impact performance in VR [108].However, prior VR experience was not correlated with performance, suggesting that the initial familiarisation scene sufciently mitigated efects of prior experience.Mental rotation ability is another factor that likely explains the high variability in performance, with those scoring highly more likely to succeed in an assembly task [42].We balanced the groups to avoid this becoming a confounding variable.
We provide the frst evidence about the implications of applying feedforward learning for fne psychomotor tasks in VR -with surprising results.In contrast to prior work [27,28,33], we fnd no signifcant efects of avatar similarity on learning and it is likely that only small efects exist.However, qualitative insights reveal that using fully customised more realistic representations of the user in observational learning was more likely to produce uncanny valley and novelty efects which can distract the user and prevent them from focusing on the learning task [81,123].These efects of selfavatars are important to note as few people will have been exposed to highly realistic digital models of themselves and therefore this phenomenon would also likely manifest in industrial VR training applications.Therefore, avoiding similarity between the user and the demonstrator avatar altogether could be the most appropriate approach for observational learning of fne psychomotor skills in VR (RQ3).The juxtaposition in our fndings compared to related work [27,33] is likely due to the diference in learning gross versus fne psychomotor skills, where the emphasis shifts from looking at the avatar as a whole performing full body movement to focusing on just the hands performing a manual task.It could be that interest in the avatar's appearance removes the user's attention [93,121,122] from the puzzle task and this inhibits any possible feedforward learning beneft.

Limitations & Future Work
We provide a frst comparison of active and observational learning techniques in VR.Whilst we provide evidence to show the advantages of implementing observational methods in combination with practice in VR, we found that providing an animated avatar to demonstrate a skill currently relies on an intensive workfow -often requiring state-of-the-art motion capture systems such as the one utilised in this study and other related work [33].However, markerless motion capture and animation technologies continue to improve (e.g.MoveAI3 ) and we anticipate that observational learning will become easier to develop and more scalable to implement.
We selected the Burr puzzle task as an example of an assembly task because it requires the same skilled elements (e.g.part recognition, selection, rotation, aligning, and fxing) as real-world industrial assembly tasks (e.g.electronic actuator assembly [34], pump maintenance [125]).However, Burr puzzles are arguably more abstract than most real-world assembly tasks, so the question of how far active and observational learning in VR transfer to real-world tasks is a direction for future work.Our task was neither too easy nor too hard for participants: most were unable to perform it immediately and almost all were able to perform it by the end.However, future work should explore VR training for tasks with varying levels of complexity.
Interactions with the puzzle pieces were achieved using controllers, which is generally preferred over freehand interaction in VR training applications [104].Whilst this proved to be an acceptable proxy, some participants expressed a desire for full fnger dexterity to enable more fne-grained control and manipulation.Full fnger dexterity would become especially important for learning more complex fne psychomotor tasks in VR (e.g. that contain smaller pieces, fner movements), and therefore integrating alternative haptic interaction methods (e.g.SenseGlove4 , Manus5 , HaptX6 ) into VR training may be necessary in the future.

CONCLUSION
Using a Burr puzzle manual task, we compared both active and observational learning approaches for retention, near transfer, and far transfer tasks and conclude that: (1) Observational learning in VR can be efective when combined with 'hands-on' practice for learning a manual task in VR.
(2) Observational learning yields signifcantly better results in real-world far transfer tasks.(3) Avatar similarity does not appear to increase learning of fne psychomotor tasks in VR. (4) Learning in VR decays over prolonged periods independent of learning approach.Observational learning is an efective VR training approach that can have signifcant benefts for VR training, especially for tasks where their real-world equivalent will involve changes in context and environmental factors.Industry stakeholders and researchers alike are well advised to consider incorporating observational learning elements into VR training.Assessing far transfer abilities of learning approaches is also important for researchers to understand the applicability of VR training approaches.

Figure 1 :
Figure 1: We compared active learning of a Burr puzzle task (a) with observational learning (b) in VR.For observational learning, we further compared three levels of similarity of the demonstrator avatar to the user (c): self avatars (d), minimally similar (e),and dissimilar avatars (f).Both active and observational learning approaches were efective, however, observational learning signifcantly improved the ability of users to transfer their learning to real-world tasks beyond the context learned in VR (g) compared with active learning.We did not fnd any signifcant efects of avatar similarity on learning.

Figure 2 :
Figure 2: Participants were given 5 hair colour options (a), 4 skin tone options (b), a male/female body type, and a young adult/older adult appearance to choose from.Self-avatars (c2a, c2b) were created using the participant's photograph (c1a, c1b).Minimal avatars (3a, 3b) were created using the selected skin-tone, body type, and age.Dissimilar avatars (c4a, c4b) were created using contrasting features e.g.other body type, age category, diferent skin tone and hair colour.

Figure 3 :
Figure 3: To test VR training of a fne psychomotor skill participants were tasked with assembling an interlocking 3D puzzle known as a Burr puzzle.The Burr puzzle confguration involves 6 unique pieces (a), and there is only one solution which requires 5 steps to assemble (b).

Figure 4 :
Figure 4: Participants learned to assemble the puzzle in VR.A familiarisation scene (a) allowed users to learn the controls before beginning the training, they could also see their own avatar in the virtual mirror.In the training environment participants observed an avatar assembling the puzzle (b) or were given active instructions (c).During the non-guided practice elements of the training minimal feedback was provided in the form of highlighting around the pieces -red indicates the pieces do not go together, green indicates that they do (d).In the test environment participants completed the puzzle assembly unaided and without feedback (e).

Figure 5 :
Figure5: Participants were tested on transferring their skills to the real world.In a near transfer test, the same colour coded pieces were arranged exactly as they were in VR (a).In the far transfer test, the pieces were not colour coded, they were arranged diferently, and participants had to listen to an audio track at the same time (b).

4. 4
Apparatus & Set-up 4.4.1 Hardware.We used a Valve Index VR System powered by a PC with an Intel i7-9900k processor, an RTX2080Ti GPU and 32GB of RAM, running Windows 10 for the VR elements of the study.

Table 2 :
The number of participants who successfully completed the Burr puzzle for the diferent tasks are reported below.The percentage is calculated using the total number of participants for that condition.

Table 3 :
The Burr puzzle required users to assemble six pieces.The median number of pieces assembled and interquartile range for the diferent tasks are reported below.