Snap, Pursuit and Gain: Virtual Reality Viewport Control by Gaze

Head-mounted displays let users explore virtual environments through a viewport that is coupled with head movement. In this work, we investigate gaze as an alternative modality for viewport control, enabling exploration of virtual worlds with less head movement. We designed three techniques that leverage gaze based on different eye movements: Dwell Snap for viewport rotation in discrete steps, Gaze Gain for amplified viewport rotation based on gaze angle, and Gaze Pursuit for central viewport alignment of gaze targets. All three techniques enable 360-degree viewport control through naturally coordinated eye and head movement. We evaluated the techniques in comparison with controller snap and head amplification baselines, for both coarse and precise viewport control, and found them to be as fast and accurate. We observed a high variance in performance which may be attributable to the different degrees to which humans tend to support gaze shifts with head movement.

In Virtual Reality (VR), head tracking via head-mounted displays (HMDs) allows users to naturally explore their surroundings by turning their heads.However, users may not always be able to perform such a movement easily or comfortably.This could be due to a posture that prevents them from doing so, for example, lying down [51], due to environmental constraints, for example, in public transport [3], or an injury or disability [53].While the default viewport control is based on an absolute mapping of head movement to display rotation, alternatives exist.This includes the use of controllers to manually control the viewport, requiring no head movement [27,35], as well as hands-free techniques that rely on amplification of head movement to move the viewport further with less movement.In this work, we explore gaze as an alternative modality for viewport control.
In considering gaze, we first need to understand how it operates.Gaze represents the direction in which we look and is based on a combination of eye and head movement.Gaze shifts over small amplitudes are possible without head movement, while large gaze shifts necessitate head rotation to bring targets into view.In a range of up to 45 • from the head forward direction, gaze shifts are possible without any head movement, although shifts over 15-20 • typically involve head movement to keep the eyes within a comfortable eyein-head position [39].Any supporting head movement is slower than the movement of the eyes and results in gaze fixations during which the head is still turning toward the target, compensated by vestibulo-ocular reflex (VOR) eye movements in the opposite direction.Other specific eye movement effects occur when gaze fixation is on objects that are in motion in the visual field, triggering smooth pursuit as a closed-loop behaviour to remain on target [54].For the design of viewport control by gaze, it is important to reflect how natural eye-head coordination translates to HMDs.Gaze shifts are relative to the world and can involve a combination of eye and head movement.As the head is coupled with the display, part of the gaze shift will then result in viewport rotation, and part in a change of gaze angle relative to the display.
We introduce three novel techniques for viewport control by gaze, illustrated in Figure 1: Dwell Snap, Gaze Gain and Gaze Pursuit.All three techniques are designed so that it is possible to control the viewport solely by eye movement, without any head movement.However, head movement is in no way inhibited and can naturally support the gaze.As any head movement supporting gaze is slower than the associated eye movement, this naturally facilitates the use of head movement to refine the viewport orientation, following any larger rotation of the viewport triggered by gaze.Our individual techniques leverage gaze based on different types of eye movement: • Dwell Snap is based on prolonged fixation akin to holding a button down to trigger viewport rotation in discrete steps (Figure 1a).• Gaze Gain is based on saccades and uses a transfer function with gaze angle as input for amplified rotation of the viewport (Figure 1b).• Gaze Pursuit is based on pursuit and rotates the view towards the direction of the user gaze, for the user to follow any target until it becomes centrally aligned in the field of view (FOV).
(Figure 1c).We compared our three new techniques with controller and head amplification baselines on an abstract task adopted from peephole pointing literature [6].Participants were effective with gaze for viewport control, and performed as efficiently with our three novel techniques as with the existing techniques.The results demonstrate the viability of gaze for viewport control while our techniques provide novel and diverse ways in which surround viewing can be supported.This has practical significance as it enables handsfree camera control with limited or no head movement.Gaze is relevant for accessibility but can also be preferable for viewport control when controllers are available, to focus manual input on other tasks.Dwell Snap, Gaze Gain and Gaze Pursuit each differ in how they enable viewport control, and their respective affordances may support entirely different types of applications, from smooth browsing to fast switching of views.
The core contributions are as follows: • The implementations of Dwell Snap, Gaze Gain, and Gaze Pursuit as techniques for gaze-based viewport control, including optimizations dealing with characteristics of the human visual system.

RELATED WORK
This research builds upon previous work on travel in VR, viewport control and gaze-based interaction in head-mounted displays.

Travel in Virtual Reality
Travel refers to the task of moving from the current location to a new target location or in a desired direction and is an essential task in VR [22].However, since VR is commonly experienced in limited spaces relative to the size of the virtual environment, and the user does not have a view of their physical environment, travel techniques must be intuitive, energy efficient, and efficient in spaceconstrained settings.As such, developing interactive techniques to address these challenges is an extensive field of research [4,9,56].
Travel techniques can vary significantly and have been developed for a variety of modalities, such as controllers [27], head [49], voice [14] or gaze [28].Techniques that redirect the user into open physical spaces without the user noticing is also a large field of research to overcome physical restrictions [32,33,46].Travel can be categorized into two actions -viewport control and locomotion [22].Viewport control is used to control FOV (in HMDs, commonly by head rotation) to navigate the view in the direction of movement.Locomotion refers to the act of moving in space.Due to the common limitation of space, locomotion has received the most interest from researchers, and it is commonly assumed that users have enough space and the ability to control their view with their heads [9].However, the range of movement of the head of the user can be limited in certain postures such as sitting [24,39] or lying [51], or due to other factors such as disabilities or injury [13].In such cases, users are not able to perform viewport control with the head alone, and support from alternative techniques is needed to give users the full range of motion in the virtual environment.In this work, we investigate how gaze can be leveraged for viewport control with limited head movement.

Viewport Control
Viewport control techniques were originally developed to overcome the small FOV of early head-mounted displays [15].Early techniques used controllers to allow easy change of the viewport with little head movement [27] and are still popular today to minimise effort and increase navigation efficiency [35].Hands-free approaches to viewport control have also been proposed and the most common technique is head amplification where the movements of the head are amplified to allow the user to reach further with head movement alone than physically possible [15,21,30,34,48,57].Researchers have also investigated systems that automatically control the viewport of the user in storytelling settings [12].Finally, techniques have been proposed that show the entire environment within the user's field, either by expanding the user's field of view with a 360 • camera [1], or by overlapping multiple views [36].While previous techniques have used gaze to steer the direction of locomotion [28], this work is first to investigate eye tracking for viewport control in VR HMDs.
Several works have investigated gaze-based navigation for desktop applications as an accessibility tool or to make navigation more efficient.Researchers have proposed gaze-based one-dimensional scrolling techniques for applications such as websites [18,25], documents [31,38] or code navigation [37], and two-dimensional panning techniques and gaze-based panning techniques for 2D applications such as maps [31].Gaze control has also been proposed for camera control in 3D video games [2,52,53].The most common interaction metaphor for such techniques is that the user gazes at the edge of the screen to move in that direction [18].Such metaphors are usually also combined with a clutch such as winking [31] to avoid accidental activation.We take inspiration from these desktopbased works in the development of our VR-based techniques.

Gaze Interaction in Head-Mounted Displays
Although novel for viewport control, gaze is widely investigated for pointing in HMDs [19,29,40,47,55].Viewport control can be regarded as pointing of the virtual camera, but the tasks differ as pointing is relative to the view for selection on the interface.Gaze pointing at targets that are presented further than 15-20 • visual angle from the display center is typically supported by head movement which in HMDs implicitly aligns the target more centrally in the view [39].Gaze pointing at targets that are initially outside the display's FOV presents an interesting case as it requires view panning to reveal the target before it can be acquired, also known as peephole pointing.When the same modality is used for both panning and pointing, the task becomes equivalent with pointing of the camera at the target.A peephole pointing task therefore lends itself for evaluation of viewport control, typically with a larger tolerance for alignment in a comfortable viewing area, although more precise control can also be desirable.For this work, we adopt a peephole pointing protocol introduced by Cao et al. [6], to evaluate gaze for coarse as well as precise control of the viewport.
A recent study by Sidenmark et al. compared gaze, head, and hand for peephole pointing in head-mounted displays and is noteworthy as it relied on the same task we use in this work [41].Their work, however, relied on conventional viewport control by head movement for the initial phase of bringing targets into view.The comparison of gaze, head and hand focused on the second phase of acquiring a target once it has appeared in view, for which gaze and hand outperformed the head due to their independence from display movement.In contrast, we compare gaze, hand and head for viewport control without any separate pointing phase or modality.

GAZE-BASED VIEWPORT CONTROL TECHNIQUES
In this section, we introduce three new techniques for gaze-based viewport control: Dwell Snap, Gaze Gain, and Gaze Pursuit.In the following, we first outline conventional techniques to control the viewport in VR (Controller Snap and Head Gain) as the baseline techniques.After that, we explain the three new techniques in detail.Dwell Snap is a modified version of Controller Snap.Instead of a button press to trigger a viewport snap, it uses the horizontal eye-gaze angle as a trigger.

Baseline techniques
3.1.2Head Gain.Head Gain amplifies viewport rotations (cf.Section 2.2).With this technique, the rotation of the user's head (coupled to the HMD) is amplified so that the visually perceived rotation is larger than the actual head rotation.The amplification factor (gain) is commonly defined as := / where is the final gain, is the rotation of the virtual camera, and is the rotation of the user's head (and by that, the HMD) [44].Contrary to Controller Snap, Head Gain is a hands-free technique.Depending on the gain, it requires relatively small head movements.Still, these movements might not always be possible (e.g., for people with restricted movement or when lying down).
For our study, Head Gain uses a constant gain of 4 in the direction of the HMD rotation.This value was chosen to be consistent with the gain of the Gaze Gain to allow for comparison (cf.Section 3.3).

Dwell Snap
With Dwell Snap (inspired by Controller Snap), the viewport snaps in the direction of the horizontal eye-gaze angle if it is above a fixed angular threshold for a predefined period of time.It keeps snapping as long as the eye gaze remains above the angular threshold.However, the time threshold is reduced for subsequent snaps.Having an initial higher threshold followed by a lower value was added to reduce accidental activation of the snapping while ensuring faster repetition after initial activation.For our study, the horizontal angular threshold for the trigger was set to 25°, as prior work has shown that natural gaze shifts rarely exceed this range [8,39].The time threshold for the trigger was set to 400ms for the first snap and 200ms for each subsequent snap if the eye-gaze angle remained above the angular threshold.This means that the user has to look 25°to the left or right for 400ms to activate Dwell Snap and perform the first 22.5°camera rotation.All subsequent 22.5°rotations happen after 200ms each.The camera rotates immediately for each snap, without a smooth transition, providing a similar experience to a teleport.The continuous snapping ends when the participants' eye gaze moves back within the horizontal angular area of [-25°;25°].
When selecting the angle for rotation, we settled on 22.5°.While a larger value is technically possible, we found that with larger values (e.g., 45°as in Controller Snap), users frequently rotated their heads.We suspect this was either the user checking whether they have rotated enough to see the target or to regain orientation.To avoid this issue, we checked 1, 2, 4, 8, and 16 (2 ) as divisors of 360°for this issue and selected the fraction of the largest one that did not have this issue (360/16=22.5).That means participants need 16 snaps to perform a full 360°viewport rotation if they keep their head still (head movement still contributes to viewport control without gain).As a comparison, Controller Snap requires 8 snaps for a full rotation.

Gaze Gain
Head Gain modifies the rotation of the virtual camera by applying a gain to the horizontal rotation of the user's head (yaw angle).Gaze Gain extends this technique by integrating not only horizontal head rotations but also horizontal eye rotations.Next to that, there is another important difference between Head Gain and Gaze Gain: In Head Gain, the rotation of the viewport is always equal to the rotation calculated via the gain factor and the head rotation.In Gaze Gain, the rotation angle of the viewport is modified by the combined eye and head rotation, a gain factor that amplifies it, and a variable angular velocity.
The introduction of a variable angular velocity was necessary when introducing eye rotation into the technique.This is to deal with three characteristics of the human visual system: Firstly, the extreme difficulty of canceling a saccade mid-movement [17].Secondly, the fact that humans can only smooth pursuit up to a certain speed [26].Thirdly, the reflexive nature of the eyes to saccade, fixate, and follow points of interest (POI) before saccading back to their original position as the viewport rotates (similar to an optokinetic response [20]).Without the variable angular velocity limiting the viewport velocity, the scene will visually move too rapidly.This causes saccades to miss their landing frequently, as the landing spot of the saccade would have visually moved.In this case, the human eye would have to catch up during smooth pursuit, introducing additional saccades ("catch-up saccades").Additionally, the third point would lead to the viewport movement being uncontrollable and jerky (or perceived stuttering of the viewport), aggravating cybersickness and disorientation [42,43].
To deal with these challenges while also keeping the time needed for a viewport alignment short, we can formulate three objectives for Gaze Gain: (1) To minimize the magnitude of viewport motion during saccades.
(2) To minimize the jerkiness of the viewport motion.
(3) To minimize the duration of viewport rotation.
We approach these challenges by modifying the angular velocity and integrating it into calculating the new viewport angle.To do this, we calculate a new velocity for every frame as follows: = unmodified new viewport angle − current viewport angle = tuning parameter determining inflexion points = tuning parameter determining the rotational speed at the respective inflection point = time elapsed between the previous and current frame The unmodified absolute new viewport angle is the viewport angle we would get if we did not consider viewport velocity, calculated with: Regarding , the current viewport angle is the direction in which the viewport is currently oriented.With that, is the viewport displacement without velocity modification.
The new is then used together to calculate the final new viewport angle per frame: where t is the time elapsed since the last frame.Equation 1 and the meaning of the tuning parameters are visualized in Figure 2. and are the angular thresholds in which would lead to an angle "in view" or "out of view" (note: is the difference between the current and the unmodified new viewport angle; "view" means the current visual FOV).If is smaller than , we consider it to be in view.Vice versa, if is larger than , we consider it to be out of view.and are the angular speeds when is "in view" and "out of view", respectively.For all values between and , we use linear interpolation between and , to reduce jerkiness.With this, we can achieve our objectives (1) -( 3): When is "in view", viewport rotation is slow but upper bounded by smooth pursuit speed .However, when the is "out of view", the camera may rotate as fast as it needs, using , minimizing the duration of the rotation.In between, we calculate by linearly interpolating between and , which reduces perceived jerkiness.That is because the minimal angular scene displacement detectable by humans increases linearly with saccade length (cf.Li and Matin [23]), and during development, we found it to be similar to perceived jerkiness.
Finally, the gain was chosen to be 4, as it is the smallest integer that allows looking directly behind without any head movement.Note, to allow for a proper comparison, we used the same gain value in Head Gain (cf.Section 3.1.2).
Together, this means that on every frame, we calculate a viewport velocity based on the remaining amount the viewport has to rotate to reach 4 times the combined head and eye yaw angle.

Gaze Pursuit
Gaze Pursuit, inspired by Zhang et al. [59], continuously rotates the viewport in the direction of the eye in the horizontal direction (yaw).When an object comes into view, and the user wants to stop (a point of interest), they naturally fixate on the point and smooth pursuit to the centre.In simple terms, that means that if the user looks with their eye to the left, the viewport rotates to the right until the user stops looking to the left but at the centre.
To avoid accidental rotations, we implemented a deadzone of [-5°;+5°] in the centre of the FOV (to cover the paracentral field of vision, 5°-9 ° [45]).This concept is similar to analogue joystick deadzones 2 and also similar to the threshold in Dwell Snap (cf.Section 3.2).Within this area, eye movements do not lead to viewport rotations.The deadzone serves two primary purposes: First, to establish a neutral position that induces no viewport rotation and allows for a scene exploration.Second, to deal with the inaccuracy of the eye-tracker, which causes, despite calibration, a small offset between where the user is actually looking and where the eye-tracking believes the user is looking.
Similar to Dwell Snap, head movements still contribute to viewport control but without gain.Similar to Gaze Gain, the velocity of the viewport rotation is adapted to every frame and then used to calculate the new viewport angle.Inspired by previous work [59], we chose a linear velocity function to calculate the viewport velocity , calculated per frame as follows: where: = viewport angular velocity = tuning parameter for gradient = yaw -yaw ℎ = dead-zone radius = time between the previous and current frame With that, the larger the difference between the eye's and head's yaw angles per frame, the larger the velocity 3 .
For our study, was set to 5.During development, we found that higher values for k increase perceived jerkiness, whereas smaller movements lead to viewport movements that are too slow to be comfortable.

Implementation details
Current eye-tracking technology in consumer-available HMDs outputs noisy data -especially while looking near the edges of the HMD or during blinks.In such cases, the eye tracker will often experience decreased accuracy or lost-of-tracking.To remedy this, all eye-tracking data is de-noised via a 1€ Filter [7] before being used by any technique.Following the tuning method by Casiez et al. [7], the derivative's filter frequency, fcmin, beta, and cutoff were set at 90, 0.05, 10, and 1, respectively.Temporary loss-of-tracking is handled by using the last known position of the eye.If eye-tracking were lost for an extended period of time (>2s in the study), all techniques assume whichever yaw angle of the eye reduces camera motion induced by the eye.

EVALUATION
Our study aims to measure how fast and accurately a participant may align the viewport to point in a particular direction with our techniques.We are also interested in the performance of our techniques compared to existing baselines.This includes errors during alignment, cumulative eye and head angle, and task completion time.

Study Design
To accommodate the large variety of scenarios that require viewport alignment, we use an abstract task that entails fine and coarse viewport alignment towards both an initially known and an unknown 3 ( ) is the sign function.If < 0, it returns -1.If = 0, it returns 0 and if > 0, it returns 1. direction, separated by a range of possible angles (or amplitudes).To do this, we modified the task of Cao et al. [6].In our task, participants sit in an empty, plain room (in VR) and have to rotate the viewport to bring a pillar (the target) into view.In our case, for coarse alignment, they have to rotate the viewport to bring the pillar within 20°of the FOV's centre.For precise alignment, they have to rotate the viewport to align the pillar precisely with the centre of the viewport (cf.detail in Section 4.2).

Task description
Figure 3 outlines the procedure of one trial for the coarse alignment task (IV2).Before each trial, the participant will align their torso, head gaze, and eye gaze towards a target positioned in front of them (Figure 3A).This is so the data gathered from each trial have the same initial starting direction.Then, for each trial, the participant will perform three consecutive viewport alignments, with the second and third being used for data analysis.First, the participant searches for the starting target placed clockwise or counter-clockwise of the initial forward direction (Figure 3B) and aligns the viewport so the target is within [+20°; -20°] of the centre of the FOV and confirms alignment with a button press on the controller.If the viewport is aligned correctly, the target is highlighted with a white border shortly (Figure 3C) and disappears.If the alignment is wrong and the button is pressed, the target is highlighted red, and the participant has to realign until the alignment is correct.Next, the participant has to search for the second target in the opposite direction of the start target and align the viewport (Figure 3D; no prior knowledge of the target location, IV3).Finally, participants have to return to the same position of the start target, which completes the trial (Figure 3E, with prior knowledge of the target location, IV3).Each viewport alignment (Figure 3A, C, D, and  E) is confirmed by pressing a button on the controller.The angular distance between the start target and the second target (red and green in Figure 3C) is defined by our IV4 amplitude.All targets are positioned 3 meters away from the user and appear as cylinders with a width of 2°visual angle.The targets start at the floor and are 8 m tall (same as the room).The participant is positioned so that the virtual camera is at a height of 4m.
For the precise alignment task, the second level of IV2, the level of control, the procedure is the same as in the coarse alignment task.Here, the difference is that the alignment is counted as correct if the centre of the FOV is over the pillar.

Procedure
At the start of the study, the experimenter welcomed the participant and explained the purpose and procedure of the study.After providing consent, participants filled out a demographics questionnaire asking for age, gender, vision, experience with video games (never, rarely, monthly, weekly, daily), experience with VR (never, rarely, monthly, weekly, daily), and experience with eye tracking (never, rarely, monthly, weekly, daily).For the study, participants were seated on a stationary seat (no wheels, no rotation) and adjusted the VR headset before completing an eye-tracking calibration.As the study commenced, participants performed all task conditions with one technique at a time before moving to the next technique.The order of techniques each participant experienced was counterbalanced via a balanced Latin square.
Before each technique, participants received an explanation of the technique from the experimenter and were allowed to practice the technique by performing a set of 4 coarse alignments as many times as they wished.After each set of 4, the experimenter prompted the participant if they wished to continue practising or start the experiment.Most participants chose not to continue practising, with a rare few choosing to practice an additional time.No participants chose to continue practising more than once.After each set of selection sequences, participants took off the VR HMD and completed questionnaires.Before starting the next selection sequence, participants were asked if they wished to have a break .
After completing all task conditions with one technique participants were asked if they had any opinions about the technique.The study lasted under 90 minutes.

Apparatus
We implemented the scenarios and the individual techniques with Unity 2021.3.14f1.We used an HTC VIVE Pro Eye VR headset for the study, with 110°diagonal FOV, 2880×1600 pixels resolution, and 90 Hz refresh rate on a computer with an Intel Core i7-12700 CPU, 16 GB RAM, and an NVIDIA GeForce RTX 3070 Ti GPU.

Measures
With our study, we are primarily interested in how well participants are able to control the viewport in fine and coarse-grained alignment tasks.To quantify the effectiveness of our technique, we measure the following: • Cumulative head yaw angle per alignment • Cumulative eye yaw angle per alignment • Task completion time: The time between the start of an alignment and the final successful confirmation of the alignment.• Error rate: Participants must align the viewport and confirm alignment.If they confirm without correct alignment, we record this trial as an error.These measures provide us with objective results about the effectiveness and accuracy of our techniques.In addition, we ask participants to fill out the RAW NASA-TLX [5] to get subjective data on the taskload.Finally, we measure cybersickness using the simulator sickness questionnaire (SSQ) [16].The participant selects the second target.E: The participant selects the first target again, which ends the task sequence.The target in E is at the same location as the target in C and the user has to rotate the viewport to select it again.

Participants
We recruited 21 participants.1 dropped out early.In the end, N = 20 participants participated in our study (10 identified as male and 10 as female).All were recruited from the local university.

RESULTS
Unless otherwise stated, the analysis was performed with a fourway repeated measures ANOVA ( =.05) with Technique, Prior Knowledge, Level of Control, and Amplitude as independent variables.When the assumption of sphericity was violated, as tested with Mauchly's test, Greenhouse-Geisser corrected values were used in the analysis.Shapiro-Wilk test and QQ plots were used to validate the assumption of normality.ART (Aligned Rank Transform) [58] was applied when normality was violated.Bonferronicorrected post-hoc tests were used when applicable.Partial eta squares ( 2 ) were used to report effect sizes.For error rate, data points were classified as outliers if they were larger or smaller than 2 × SD of the group mean and corrected using winsorization [10].
For cumulative head yaw angle, cumulative eye yaw angle, and task completion time, we performed outlier removal based on amplitude groups following the same procedure as larger amplitudes naturally lead to longer travel time, which increases all three measures systematically.Subjective data was analyzed with Friedman tests and Bonferroni-corrected Wilcoxon signed-rank tests as post-hoc tests.

Error Rate
Figure 4 shows an overview of error rate grouped by Technique, Amplitude, and Task.all cases with high variance.Analysis shows a significant fourway interaction F (8.389,159.386)= 5.443, p < 0.001, 2 =0.223).In pairwise comparisons for each technique, we found higher error rates for precise than for coarse control for specific conditions: with Controller Snap, for tasks with prior knowledge and amplitudes 30°a nd 60°(all p<0.043);With Head Gain, for all conditions (p<0.049) except the 15°NoPK (p=0.096) and 60°NoPK (p=0.065); with Dwell Snap for low amplitude conditions without prior knowledge (15°, 30°, and 45 °) (all p=0.021); with Gaze Gain, for the high (180°) and the low amplitudes (15°, 30°) regardless of prior knowledge (all p<0.042); and with Gaze Pursuit, for the larger amplitude conditions (120°, 180°) irrespective of prior knowledge (all p<0.043).

Cumulative Head Yaw Angle
We found no significant interaction effects on the cumulative head yaw angle (all tests, p>.171, 2 <.077), indicating that the head yaw angle did not differ between factor combinations.However, we found a significant main effect of amplitude (F (1.323,25.135)= 6.060, p = .014, 2 =.242)).Post-hoc tests indicated that every combination of amplitudes significantly differs (p<0.01, Figure 5).

Taskload (RAW NASA TLX)
Figure 8 illustrates the results by Technique and Level of Control.
5.5.1 Effort.For effort, Technique had a significant impact on the coarse alignment task ( 2 (4) = 16.226,= .003).Here, we found a significant difference between Controller Snap and Dwell Snap ( =.001).The technique also had a significant effect on effort for the precise alignment task ( 2 (4) = 12.985, = .011).However, we found no significant difference with post-hoc tests.Additionally, only Head Gain ( =11.842, p=.001) had a significant difference between tasks with higher effort for the precise task.
Even so, post-hoc tests showed no significant differences.Technique also had a significant effect on frustration for the precise task ( 2 (4) = 15.078,= .005Snap had significantly lower frustration than Dwell Snap ( =.044) and Head Gain ( =.023).Only Head Gain ( =4.571, p=.033) and Gaze Gain ( =5.400, p=.020) showed significantly different levels of frustration between tasks, with it being higher for the precise task.

Mental Demand.
We found no significant differences between techniques in mental demand for the coarse alignment task.However, we did find that technique had a significant effect on mental demand for the precise task ( 2 (4) = 13.081,= .011).However, post-hoc tests showed no significant differences.Finally, looking at the difference between techniques, Gaze Gain ( =6.368, p=.012) had a significant difference between tasks with a higher mental demand for the precise task.

Physical Demand.
In the coarse alignment task, Technique had a significant effect on physical demand ( 2 (4) = 10.049,= .040).However, post-hoc tests were not significant.Technique also had a significant effect on physical demand for the precise alignment task ( 2 (4) = 15.735,= .003).Here, post-hoc tests showed that Dwell Snap had significantly higher physical demand than Controller Snap ( =.014) and Gaze Gain ( =.007).Only Gaze Gain ( =5.556, p=.018) and Gaze Pursuit ( =8.895, p=.003) had a significant difference between tasks for physical demand, with it being higher in the precise task.
5.6 Simulator Sickness Score 5.6.1 Disorientation.Technique had no effect on disorientation, neither for the coarse nor for the precise alignment task.There was also no effect on disorientation of the task itself (all p>.05).
5.6.2Nausea.Technique did not have a significant effect on nausea for the coarse alignment task.However, Technique had a significant effect on nausea for the precise task ( 2 (4) = 9.589, = .048).
Post-hoc tests showed no significant differences.Only Head Gain ( =5.333, p=.021) had a significant difference between tasks, with a higher nausea for the precise task.
5.6.3Oculomotor symptoms.Technique had no effect on the oculomotor symptoms for any task.However, Head Gain ( =4.571, p=.033), Gaze Gain ( =7.143, p=.008), and Gaze Pursuit ( =4.000, p=.046) had a significant difference between tasks, with oculomotor symptoms being higher for the precise task.
5.6.4Total Score.Technique had no effect on the total score for any task.Similar to the oculomotor symptoms, Head Gain ( =4.765, p=.029), Gaze Gain ( =9.800, p=.002), and Gaze Pursuit ( =4.000, p=.046) had a significantly higher total score for the precise task than for the coarse task.

Summary of Results
In summary, the main insights for the evaluation of our techniques are: • Gaze is effective for viewport control.Participants were able to complete tasks successfully, with low error rates and completion times comparable to manual control.This demonstrates the viability of gaze for viewport control.• The study did not reveal significant differences in performance, task load, or simulator sickness between our novel gaze-based techniques and established baselines.This is encouraging as it positions gaze as a genuine alternative to methods that depend entirely on the head or hand.• We found no significant effects of Technique on the amount of eye versus head movement.This is surprising as techniques differ in reliance on eye movement relative to the head.However, technique differences appear masked by individual differences in the tendency to support gaze shifts with head movement [11].• Technique differences showed only in specific conditions with higher error rates, pointing to different challenges depending on technique behavior.

DISCUSSION
This work is the first to demonstrate gaze for viewport control.Gaze is not straightforward to harness for control as we rely on the movement of our eyes primarily for our visual sense.Viewport control presents a particular challenge in this respect, as the task involves searching through the viewport and aligning the camera towards a target -both inherently visual subtasks that a control method needs to accommodate.Our work shows how viewport control by gaze can be realized in principally different ways, as seen in the implementation of Dwell Snap, Gaze Gain, and Gaze Pursuit.
Our techniques enable horizontal 360°viewport exploration with only a user's gaze as input.As a headline result from our evaluation, we found all three techniques effective for the task, and hence, gaze viable as a modality for viewport control.The motivation of our work was to facilitate surround viewing in HMDs with less movement than required in the default mode, where the view is directly coupled with the head.We, therefore, compared our techniques with Controller Snap and Head Gain as existing methods designed for the same purpose.Our study found no significant difference among techniques on key measures.Participants were as efficient with our new techniques and reported comparable task load and simulator sickness compared to the baselines.These positions gaze as an alternative that is on par with existing methods in performance and usability, yet not reliant on hand or head movement.
Hands-free viewport control with less or no head movement has wide-ranging practical relevance.Gaze can be of importance for accessibility and especially for users who have limited or no head or hand movement due to injury or disability.Gaze can also facilitate viewport control in any situation that constrains other movement -for example, in crowded areas, sitting or standing in a tight space or close to others, or when relaxing or lying down in a comfortable posture.Gaze-based viewport control would be preferable when more conspicuous movement is socially awkward -for instance, on public transport.Gaze-based techniques are also a useful alternative to Controller Snap when applications do not rely on controllers for other input, for example, when the interaction is based on hand tracking or when viewport control constitutes the main interaction, such as in cinematic VR.However, our techniques are equally relevant when controllers are available to focus controller use on other input tasks, in the same way, that manual control is separate from viewport control in the default head-coupled mode.
Dwell Snap, Gaze Gain, and Gaze Pursuit differ substantially in how they enable viewport control, each with distinct affordances that may cater to different applications.The techniques are based on different types of eye movement (fixations, saccades, and smooth pursuit) and control the viewport differently in discrete steps, amplified rotation, or continuous motion.Gaze behavior is affected by the nature of the content presented, for instance, motion in the scene, and the user's task, for instance, focused attention versus casual browsing.We consider Gaze Pursuit best suited for video-like experiences such as 360°-movies, as Gaze Pursuit would leverage the eyes' natural ability to follow the moving content presented.Gaze Gain is distinct among our techniques (and also different from Controller Snap) in preserving a sense of direction, where the user always returns to their initial view when they gaze straight ahead in a face forward position.This makes the technique preferable for experiences that benefit from a sense of direction or spatial awareness, such as virtual galleries or first-person exploration games.Gaze Dwell, in turn, may excel in applications that require fast switching between different areas and workspaces, such as during work-like and information-dense scenes, multi-screen setups, data analysis, or immersive teleconferences.The technique affords a stable view that can be visually explored with gaze in a natural eye-in-head movement range of up to ±25°without moving the viewport while supporting fast switching by fixation beyond that range.Naturally, there will also be application contexts in which our techniques may function less well.For example, fast-paced applications such as VR fighting games require rapid reactive gaze to events across the scene, which might lead to inadvertent viewport effects.However, our techniques showcase a range of concepts and behaviors with scope for adaptation and tailoring to different application requirements.Gaze is complex not only in how it affords both vision and control but also in how it combines eye and head movement.Our techniques allow viewport control to become possible without head movement while not prohibiting or disregarding any natural head movement contribution to gaze.Gaze Gain relied on saccades, which we expected to be naturally supported by head movement, to the effect of head movement seamlessly aiding with the refinement of the viewport orientation.In fact, we observed most errors with Gaze Gain for short gaze shifts (15°) that did not implicitly trigger head movement and had participants rely on less precise eye saccades for camera alignment.Dwell Snap required eye movement to more eccentric angles within the HMD, with head movement of no utility for control.Here, we also observed more errors for shorter amplitudes, with participants moving too quickly and overshooting.Gaze Pursuit relied on eye rotation to an offset from the head to trigger rotation of the scene through the viewport, which the head could simultaneously move to reach the target faster.Here, we saw more error at larger amplitudes when the head was at maximum rotation, which hampered the head during the refinement of camera alignment.
In spite of the differences in design, we did not find any significant differences in cumulative eye and head movement between any of the techniques studied.However, despite a sample size of N = 20 and 3 blocks, we found standard deviations high (also for other measures).Here, we believe that fundamental differences between humans rather than technique and task are the reason for these relatively large variances.Research in eye-head coordination has found significant idiosyncratic variations in people's tendency to support gaze with head movement and suggested that people fall into groups of "head-movers" and "non-head movers" [11,50].Specifically, in ranges up to 45°from the central position, gaze targets can be acquired completely without head movement through gaze at more eccentric angles, or by using both, eye and head movement.In our study, we found, for example, that 40% of the participants using Gaze Pursuit moved their head less than 15°across all coarse alignment trials, c.f. Figure 5) while others made more extensive use of head movement to complement eye-controlled view rotation.
Our work comes with several limitations.We evaluated our techniques on an abstract task in a plain virtual environment to establish baseline performance.The abstract task followed an established protocol but did not include distractions that might influence performance in more realistic application contexts.Our techniques have the potential to support a wide range of applications on account of their different affordances, but further work will be needed to analyze the design space and evaluate techniques for various applications and use cases.We conducted our study with participants in a seated position as this is a common setting that limits the viewing range in a head-coupled display.Our techniques also aim to facilitate viewport control in other poses that may constrain head movement further, but these were not tested.We designed our techniques to be usable without any contribution of head movement but chose not to constrain head movement artificially, as we aimed for gaze control to be in natural coordination of eye and head movement.As a result, differences between techniques may have become masked by individual differences in the tendency to support gaze with head movement.This suggests further work to understand better and support individual differences.The use of gaze, of course, also has practical limitations as it relies on eyetracking.However, compared with other eye-racking tasks, camera control will typically not require as much precision and be more readily feasible within the limits of state-of-the-art eye tracking.

CONCLUSION
In this work, we presented and evaluated three gaze-based viewportcontrol techniques: Dwell Snap, Gaze Gain, and Gaze Pursuit -each leverages a different type of eye movement.Compared with head amplification and controller snap as baselines, we could show that all three of our techniques achieve competitive performance during coarse and precise viewport alignment.Participants did not systematically make more errors with our techniques, did not need more time to align the viewport, and eye and head movements stayed normal.Our results also highlight large variations in performance, possibly because of varying characteristics of how individuals' eyes and head work together.Together, we could show for the first time that all three, gaze, natural head and eye movements, and natural eye-head coordination, can be integrated into viewport-control techniques.This enables radically new interaction techniques for various applications like 360°video and scenarios such as VR on public transport, when lying down, or for people with injuries or disabilities.Further, our work can inspire novel interaction techniques next to viewport control by enhancing gaze-based selection in virtual and augmented reality.

Figure 1 :
Figure 1: Viewport Control with dwell snap, gaze gain, and gaze pursuit.Dwell Snap is illustrated by Figure (a).It shows the top-down view of a user with the head looking forward and the eyes looking to the left.If the eyes' yaw angle crosses a threshold and stays there for 400 ms (see graph at the bottom), the viewport snaps once in this direction (first "x" in the graph).Follow-up snaps take only 200 ms if the eyes stay in this area (second to fourth "x").Dashed lines indicate viewport positions.Figure (b) illustrates the principle of Gaze Gain.If the user's eyes yaw into a direction (small magenta arrow), the viewport rotates into this direction with amplified rotation (red arrow).Backward yaw (yellow arrow) leads to the opposite viewport rotation (blue arrow).Gaze and viewport directions are illustrated in the lower part of (b).Figure (c) shows the working principle of Gaze Pursuit.Here, the user's gaze crosses a threshold, and with that, the viewport starts rotating (red arrow) until the gaze returns to the central field of view.The red lines in the graph correspond to the viewport rotation, whereas the black line corresponds to the eye direction ABSTRACT

(Figure 2 :
Figure 2: Schematic illustration of the angular velocity for Gaze Gain.

Figure 3 :
Figure 3: Schematic overview of the alignment task.A: Participants first perform an initial alignment.B: The participant searches for the start target that is randomly located at their left.C: The participant has aligned the viewport and confirms.D:The participant selects the second target.E: The participant selects the first target again, which ends the task sequence.The target in E is at the same location as the target in C and the user has to rotate the viewport to select it again.

Figure 7
Figure 7 shows the average task completion time per technique.Analysis with a four-way repeated measures ANOVA on aligned Error rate by Amplitude per Technique.

Figure 4 :
Figure 4: Error rates by Task (a) and by Amplitude (b) per technique.Error bars represent the 95% confidence interval.There was a significant four-way interaction and a relevant significant main effect of Level of Control (precise led to more errors than coarse).

Figure 5 :
Figure 5: Cumulative Head Yaw angle by Task (a) and by Amplitude (b) per Technique.Error bars represent the 95% confidence interval.There were no significant interactions.There was a significant main effect of amplitude: cumulative head yaw angles of every amplitude are significantly different to all other amplitudes.

Figure 6 :
Figure 6: Cumulative Eye Yaw angle by Task (a) and by Amplitude (b) per Technique.Error bars represent the 95% confidence interval.There were no significant interactions.There was a significant main effect of amplitude.Overall, cumulative eye yaw angles of every amplitude are significantly different to all other amplitudes.

Figure 7 :
Figure 7: Average Trial Duration by Task (a) and by Amplitude (b) per Technique.Error bars represent the 95% confidence interval.There were no significant differences.

Figure 8 :
Figure 8: Results of the raw NASA TLX by Technique and Level of Control.Error bars represent the 95% confidence interval.

Figure 9 :
Figure 9: Simulator sickness questionnaire by Technique and Level of Control.Error bars represent the 95% confidence interval.

Table 1
detail for each technique by level of control.Mean error rates were low for coarse control, including with gaze-based techniques (2.1%, 1.0% and 0.1% with Dwell, Gain, and Pursuit), and moderately higher for precise control (mean error 6.4%, 11.4% and 6.1% with Dwell, Gain, and Pursuit), however in

Table 1 :
Mean error rates and standard deviation of each technique by level of control.
). Post-hoc tests showed that Controller Cumulative Eye Yaw angle by Amplitude per Technique.