Viewer2Explorer: Designing a Map Interface for Spatial Navigation in Linear 360 Museum Exhibition Video

The pandemic has contributed to the increased digital content development for remote experiences. Notably, museums have begun creating virtual exhibitions using 360-videos, providing a sense of presence and high level of immersion. However, 360-video content often uses a linear timeline interface that requires viewers to follow the path decided by the video creators. This format limits viewers’ ability to actively engage with and explore the virtual space independently. Therefore, we designed a map-based video interface, Viewer2Explorer, that enables the user to perceive and explore virtual spaces autonomously. We then conducted a study to compare the overall experience between the existing linear timeline and map interfaces. Viewer2Explorer enhanced users’ spatial controllability and enabled active exploration in virtual museum exhibition spaces. Additionally, based on our map interface, we discuss a new type of immersion and assisted autonomy that can be experienced through a 360-video interface and provide design insights for future content.


ABSTRACT
The pandemic has contributed to the increased digital content development for remote experiences.Notably, museums have begun creating virtual exhibitions using 360-videos, providing a sense of presence and high level of immersion.However, 360-video content often uses a linear timeline interface that requires viewers to follow the path decided by the video creators.This format limits viewers' ability to actively engage with and explore the virtual space independently.Therefore, we designed a map-based video interface, Viewer2Explorer, that enables the user to perceive and explore virtual spaces autonomously.We then conducted a study to compare the overall experience between the existing linear timeline and map interfaces.Viewer2Explorer enhanced users' spatial controllability and enabled active exploration in virtual museum exhibition spaces.Additionally, based on our map interface, we discuss a new type of

INTRODUCTION
During the COVID-19 pandemic, cultural institutions faced closures, leading to an increased demand for at-home experiences as a substitute for physical visits [6,17,54].To bridge this gap, museums embraced advanced technologies like the metaverse, interactive web, and virtual reality (VR) to recreate cultural experiences virtually [27,68].These virtual museums present realistic virtual tours and exhibitions, offering visitors a glimpse into the museum's exhibition space from the comfort of their homes [17,18,68].Common representation methods include 3D environments and panoramic images, providing visitors with direct and autonomous navigation for engaging and active learning experiences.
Highlighting the enhanced freedom of virtual movement, 360degree videos offer a cost-effective alternative for virtual tours.Unlike traditional video formats, they deliver an enriched sense of reality, allowing viewers to control their viewpoints and immerse themselves in a 3D environment, capturing the vibrant atmosphere of exhibition spaces [54,68].This enhanced controllability aligns with the concept of 'autonomy' [37], granting visitors a sense of self-agency in head movement different from traditional videos.Through this given autonomy, viewers derive satisfaction from aligning their gaze with intentions [70] and foster a deeper connection to the virtual environment [22].
However, unlike traditional virtual environments with panoramic images, 360-degree videos limit the manipulation of orientation and position.Consequently, 360 virtual tours emphasize interactivity, providing narrative development options while relying on borrowed perspectives [2], rather than actively experiencing changes in orientation and position.This limitation in 360 videos invokes a distinct form of immersion compared to traditional virtual environments.The two forms of immersion could be differentiated as "immersion as transportation" and "immersion as absorption" [4].While viewers immersed in traditional virtual environments experience a sense of transportation as they explore the virtual world and interact with entities, 360 video viewers, in contrast, become deeply engaged in specific situations in a virtual environment and experience the immersion as absorption.
In our study, we deviate from this conventional approach to extend the manipulation range within the gaze-centric constraints of 360 video.We aim to introduce spatial navigation capabilities, including orientation and position manipulation, even in a 360video virtual exhibition tour.The goal is to explore whether viewers can transition from passive video watching to active exploration, aiming for viewers to be immersed as 'transported' rather than 'absorbed'.
Furthermore, various platforms are available for viewing 360 videos, and the web has emerged as one of the most widely used platforms [23,75].Accordingly, numerous virtual tours utilizing 360 videos have been created.However, despite the expanded support for the 360 video format, the use of conventional timeline interfaces persists when viewing 360 videos in a desktop environment.[43,69].In a desktop-based virtual environment, where changes in direction and position are not directly controlled by the physical body movement of the user, indicators providing users with directional and positional information should be designed to facilitate spatial navigation [56].To aid navigation or way-finding in space are commonly based on the map interface [32,41,44].Like finding directions using a map and compass in unfamiliar places in the real world, map-based interfaces have been employed to offer users efficient spatial exploration in the virtual world [61,67].Therefore, we aim to propose an interface supporting spatial interactions, allowing the manipulation of the videographer's viewpoint, path, and location in 360 video content without the constraints of a timeline-based interaction.
Our study introduces Viewer2Explorer, a map-based video interface enabling users to actively perceive and navigate virtual space represented through 360 videos(see Fig. 1).In contrast to the conventional timeline interface makes the viewer go with a pre-defined flow, this interface leverages spatial controllability, supporting the active and random exploration of virtual tour spaces.To evaluate the interface, we created a virtual exhibition prototype using a 360-degree walking tour video, which is popular content for people to explore remote spaces.Dividing 24 participants into two groups, we compared the overall experience between the existing timeline and designed interfaces.The map-like interface demonstrated enhanced spatial controllability, supporting viewers' active exploration of virtual tour spaces.Our proposed 360 video navigation interface contributes in the following ways.
• Active Exploration Experience: Unlike the somewhat passive viewing experience associated with integrating 360 video virtual tours using conventional timeline-based interfaces, Viewer2Explorer introduces a map-based video interface.This interface provides users with a more comprehensive range of spatial navigation in video, surpassing the limitations of traditional methods and fostering an engaging exploration experience.• Temporal to Spatial Expansion: Through extensive analysis of user experiment data from surveys and use logs, we substantiate that our method transcends the temporal constraints of existing timeline interfaces, allowing viewers to explore the 360 video environment more independently and autonomously.

RELATED WORK 2.1 Video Interactions for Viewer's Autonomy
The general video format has a passive characteristic that enables one to watch and experience the space from the videographers' perspective without any manipulation [48].However, the user can break the predetermined flow of images and move freely in space for a more immersive experience [14].There have been presented two approaches that enable viewers to engage and immerse themselves in the experience rather than accepting the passive experience.The main focus was on expanding the navigation scope and inducing active observation and exploration by viewers [3,42,45,52].The first approach gives viewers a choice to engage with the video content.Such interactive or branched videos enable viewer decision-making in the video process, gathering decisions that lead to a large nonlinear video viewing experience [11,33,64].Netflix's Black Mirror: Bandersnatch is an example of an interactive video that designs different endings based on viewers' choices [55].It invokes curiosity and pleasure in the viewer regarding the video itself, enabling them to take the initiative in the story flow while watching the video using an interactive format.The second approach presents annotations in videos so that viewers can selectively watch what they want themselves.For instance, Kallioniemi et al. placed hotspot icons in 360 videos so that users could selectively browse the details of the content with their eyes [26].Matos et al. placed annotations such as miniature guides, arrows, and mini-maps on a 360 video screen to guide and prevent viewers from missing key sections [41].These interfaces provide visual guidelines to give users autonomy while immersively engaging them in the experience.
Based on this previous research, attempts to expand the scope of navigation through interactive videos are developing so that viewers can directly interact with the experience.Despite previous studies have indicated that viewers are able to engage with content according to their intentions, there were limitations in manipulation.Thus, viewers should eventually follow the storyline along the prearranged movement owing to the epic characteristics of the video medium.In this study, beyond following and passively observing the narrative of the video by augmenting autonomy, we focused on providing an experience in which the user directly becomes the subject and explores the virtual environment represented by 360 videos, as in a physical and actual exhibition environment.

Spatial Navigation in Video
A traditional video interface provides a timeline slider for viewers to search for specific points in time [7].To give more autonomy to viewers, there have been attempts to expand the scope of viewer manipulation through spatial navigation within the video.For instance, some methods enable users to manipulate locations in multiple directions in the video space or assist viewers in recognizing geographical information and searching for the desired route accordingly [21,32].Furthermore, attempts have been made to use physical map coordination or GPS for viewers to perform spatiotemporal navigation [32,44,49,58,66].
Unlike 2D videos, 360 videos provide a high level of gaze manipulation freedom, but distraction can occur more frequently.Therefore, navigation interfaces using various spatial indicators have been developed that help viewers navigate to major parts of the video by providing a pathway, door, or link on the object in the video [16] or marking mini-maps and orientations to identify their viewing area and direction [47,49].In addition, previous research provides spatial information using slit-scan visualization above the video timeline [36].These methods assist users in searching the specific time point in which they are interested or understand the surroundings in the current scene.However, these studies lack spatial information on recorded space, which is essential for users to plan the route or understand the storyline of the contents.
Moreover, there have also been attempts to apply a navigation interface to trigger motivation while watching 360 virtual exhibition videos, enabling viewers to learn more information on cultural relics [1,2].They adopt a branching narrative approach to give viewers a choice on the path they want to go and to stimulate their motivation to explore the historical site with a sense of accomplishment by receiving different scores according to their choice.Also, other research utilized the navigation interface components (i.e., map, spatial marker) to provide a co-viewing experience of 360 heritage tour video [34].However, most of this research focuses on delivering the core message of GLAM (galleries, libraries, archives, and museums) by maintaining users' attention on the course.
Providing an efficient interface is essential, but guiding viewers to explore the GLAM space actively is important for users to fully immerse in [31,53,71].In addition, since the current 360 videos are mostly based on the 2D web interface, which is limited to screen size, developing an interface that could provide spatial information intuitively for augmenting the sense of presence and induce active exploration is essential.Therefore, we designed an interface that utilizes a map synced with the timeline and spatial information of the physical museum, which could enhance the viewer's sense of presence and active exploration through spatial understanding.To verify and gain insight, we probed how users explore the remote space and experience the museum exhibition contents through our interface and compared it with the previous timeline interface.

360 VIDEO SPATIAL NAVIGATION INTERFACE DESIGN
The previous method of spatial orientation has been developed to convey direction or location information, especially using a headmounted display (HMD) to view 360 videos.However, various web platforms support 360 videos nowadays, such as YouTube.In this study, we tried to maintain the user's mental model built with the existing timeline interface while expanding the spatial manipulation of 360 videos.
Based on previous studies on various platforms (e.g., 360 video [1], virtual exhibitions [2,38], topographic interfaces [32,44,47], and video games [2,40,57,73]), we propose a 2D web-based video interface that enables viewers to actively explore the virtual space in 360 video by providing a more comprehensive range of spatial navigation.In addition, the interface enables viewers to create their own exhibition experiences through active and nonlinear exploration, even in timeline-based linear video content.The specific design and implementation of the interface are described in Fig. 2. By implementing the designed interface, we aimed to study the following research questions: Existing video interfaces are designed as linear timelines such that users can browse based on the time axis [7].Because the video does not augment the space in 3D but only shows a screen subordinate to the camera's gaze, the viewer finds comprehending the exhibition content outside the screen or the overall video space difficult [48,56].
However, this study focused on a lean-forward experience, which is an experience in that viewers interact and control the flow of the media in an active manner [28], that explores the exhibition space through 360 videos.Therefore, optimizing the navigation functions for the spatial search interface rather than the temporal search is essential.
3.1.2Motivation for Active Exploration.Free and selective learning in the museum is based on the curiosity and expectations of visitors [15].In addition to internal and personal motivation, the  museum has provided external motivation through environmental factors such as lighting and signs that draw viewers' attention.These motivations are related to the overall satisfaction of the exhibition experience [74].As this study aims to induce both external and internal motivation, factors that affect motivation for active exploration should be considered when designing a map interface for virtual exhibitions.

Video
Navigation that Enhances Presence.When people feel a high level of presence in a virtual world, they act like themselves in real life [60].In addition, in an environment mimicking the real world, people tend to show more autonomous participation through their knowledge or technology.Therefore, interactions occurring in a virtual environment have been designed such that users can perceive them as realistic experiences [2,24].This study devises a concept that can give viewers a sense of realism as if they are in an exhibition space through 360 videos filmed in advance without a complete digital representation.We attempted to convey the feeling of actual locomotion and exploration in the exhibition space through filming and video navigation methods close to realistic movements to induce presence and realism experience in viewers.

Video Design
This study produced a 360 video museum exhibition by filming a special exhibition of "Human, Material, and Modification -10,000  Years of Finnish Design" held at the Cheongju National Museum, South Korea.Before starting the filming, we checked the route people could travel by looking at the bird's eye view of the entire exhibition hall, as shown in Fig. 3. Next, the route was divided into four paths, one (entrance), two, three, and four (exit), according to the order of the subtopic assigned by the exhibition organizer.Filming was conducted using a 360 camera (Ricoh Theta V) for each pass.The videographer walked along each path with a camera fixed above the head.The four films were connected in the order of the number of paths to produce a full video (resolution: 3840 × 1920, bit rate: 56 Mbps; full video length: 5 min 17 s, path 1: 1 min 22s, path 2: 1 min 10s, path 3: 1 min 32 s, path 4: 1 min 13 s).As shown in Fig. 3, the starting points of Path 2, 3 and 4 start at the crossroads, indicating that the two videos stitched and continued at the crossroads.Path 2, 3, and 4 videos were filmed and edited to start at the same location and direction where the previous path ended so that the two videos could be continuous when leading to another path at the crossroad.Because the video was filmed using a 360 camera, the viewer could modify the view by manipulating the gaze direction when playing or stopping the video.Without manipulation, the video reproduced the screen filmed from the videographer's perspective.

Viewer2Explorer: Map-based Interface Design
In this study, we designed Viewer2Explorer, an interface that combines a map interface with a conventional timeline to help viewers' spatial navigation while viewing 360 videos.As the conventional 2D web interfaces for video streaming (e.g., Youtube [46], Netflix), we used a keyboard and a mouse for participants to utilize the interface.The specific design and implementation of the interface are described as follows.

Coin
Color-coded path

3.3.1
Structure.This study devised an additional map interface based on previous design considerations to enable viewers to manipulate gaze direction and spatial location without using HMDs.
The interface had three main elements: a color-coded path, a viewer indicator, and coins (see Fig. 4).First, different colors were assigned to each path to distinguish them and provide viewers with information about the movable space.The viewer indicator moves along the path while the video is being played and visualizes the line of sight and position of the viewer on the map.For instance, the field of view of the indicator rotates according to the viewing direction manipulation with mouse drag.Finally, the color and location of the coin indicate the location of the exhibits in the exhibition space.
In addition, the coin is designed to function as a motivating element that induces viewers to explore the entire space by changing from a filled to an empty coin space every time they pass through the exhibits.

Interaction.
Similar to moving to a specific time point by clicking on the timeline in the conventional video interface, viewers can move to a specific location by clicking on a particular spot on the path in the map interface.Since the path was created by bending the timeline, manipulating the video through the map is the same as manipulating the video's timeline.However, we hypothesized that this could induce users to feel that they are navigating spatially rather than temporally through the map.
In addition, our interface attempts to induce viewers to explore space actively by continuously giving them options, even for passive viewing, without moving indicators.For example, whenever the video of Path n ends and faces a crossroads, the video is stopped, and the viewer can choose the direction they want to proceed by clicking the arrow in the map interface.Then the video proceeds to the path that the viewer chose.Viewers can manipulate gaze direction through mouse manipulation while playing the video.Choosing a route at a crossroads could be difficult if the gaze direction was not maintained in the same direction as the videographer when stopping the video.To prevent this, we rotated the view direction slowly to face the same field of view as the videographer when the video was paused.

Video control.
In the map interface, users can manipulate the movement of the videographer through the four functions that have been used in the timeline interface (see Fig. 5).The video navigation function consists of play/pause, faster, slower, and skip buttons.Viewers could utilize these four functions using the keyboard arrow keys and the space key, similar to the timeline interface.In addition, the fast, slow, and skip functions were redesigned in the prototyping process to reduce the awkwardness when video control functions originally designed for temporal interaction were replaced with movement control for spatial navigation.
The detailed actions for each function are as follows.Firstly, the speed control (i.e., fast and slower functions) was modified only to accelerate or decelerate while the key was pressed.The timeline interface's speed control function behaves like a hardware toggle button; when pressed once, the state changes and the video speed remains constant until the key is pressed again.Toggle buttons are suitable for situations where discrete state changes are infrequently required.However, this was replaced with a momentary switch interaction in the map interface to provide continuous control, enabling viewers to accelerate and decelerate more comfortably and naturally.This choice was made considering viewers' frequent use of speed control for active exploration within the virtual space.
In the map interface, the skip function was also transformed into an interaction that allows viewers to skim through space five seconds later rather than jumping to a time five seconds later.This function was designed to provide a smooth spatial transition, helping viewers intuitively understand how they arrived at a location five seconds in the future.Consequently, even if the user transitions to the space after five seconds, viewers would perceive it as a rapid movement within the same space rather than a separate scene.

Implementation
Based on the design considerations, we implemented these interfaces with Unity Engine (2019.4.2f1).The 360-degree video was recorded using Ricoh Theta V, and it was played with the 360 video player asset (Interactive360).This asset provides basic video play, pause, and skip functions and information on the current video status, such as the current frame and time.We used these functions and information to map the video status on the progress bar and button input interactions.Also, the map interface was implemented based on the progress bar mechanism, but the bar was split when the route was in a rotating (intersecting, bending) point, and it was placed on the map image.The corresponding video time point was assigned to each progress bar to show the appropriate video when viewers pressed the bar (position control).The viewer indicator was synced with the video time and camera rotation.For coin interaction, we used the collision function to remove coins when viewers have gone through the path.

USER TEST
Compared to the conventional time-based video interface, Viewer2Explorer is expected to induce the user's proactive search and provide an expanded spatial experience while watching 360 videos.To evaluate and explore the experience with the Viewer2Explorer interface, we implemented prototypes and analyzed two different interfaces: timeline (T condition) and timeline + map (T+M condition, equals to Viewer2Explorer).The user study was conducted in a controlled experimental environment equipped with a desk, desktop computer, and headphones, as depicted in Figure 6.Participants wore headphones during the exhibition to listen to the video's sound, and the volume could be adjusted according to their preferences.Throughout the experiment, participants utilized a keyboard and mouse as controllers.The user study was focused on observing participants' behaviors and verbal responses to museum exhibition experiences.The built-in applications of Windows 10 (Voice Recorder, Game Bar video recorder) were employed to record and capture participants' voices and desktop screens.To observe whether each element of spatial exploration helped users understand the exhibition space in the video, we analyze how much subjective exploration was performed and how fun the exploration was.If the video content differs between interfaces, an individual's interest in the exhibition content can change and affect the experience evaluation.Therefore, we used the same exhibition content for both interfaces in the user test.The user test protocols and methods were approved by the university's Institutional Review Board (IRB) and rewarded $10 after participation.

Participants
To ensure that the virtual exhibition was fully conducted with the will and interests of the participants, those who had experienced virtual exhibitions except 360 video virtual exhibitions and enjoyed As an icebreaker, we initiated the study with a concise online survey and brief interviews.The Google Forms survey collected basic participant information, such as age and gender, and insights into their virtual museum experiences.Following the survey, participants shared details about their past museum visits, specifying the type of virtual museum they had explored, their favorite museum, and their general viewing habits.Of the 24 participants, 6 indicated no prior exposure to virtual museums.In contrast, the remaining participants reported engagement with exhibition-related videos, 3D virtual exhibitions, and web exhibits that incorporated both photos and textual information about the exhibits.

Introductory Demo Session.
Prior to the main study, we conducted an introductory demonstration.We provided the participants with the research description and brief information about the exhibition used in the main study.The participants were asked to complete an online survey from which we acquired information about the participants and their general virtual experiences, such as the type of virtual exhibition and 360 videos they watched.After completing the survey, participants were asked to familiarize themselves with the interface, and each function using a demo version video created with a random 360 video content on YouTube [46].This is because the shape of the interfaces used in the study, particularly the map interface, is different from that of the conventional video navigation interface.The demo session helped participants focus on the visiting experience rather than the interface control.
The demo session lasted until participants became accustomed to the interface.

Main Study Session.
Once the participants were familiar with the interface, they experienced video exhibitions through each interface depending on the groups previously divided.Before starting the video, the participants were informed that the overall video duration was 5 minutes and 17 seconds.Participants were instructed to wear headphones while watching and informed that they could watch as freely as they wanted and stop at any time.While the participants watched the video exhibition, their behaviors, and utilized interface functions were recorded and carefully observed.

Survey Session.
Immediately after the participants finished watching the video exhibition, they were asked to evaluate their overall experience and interface usability using a 7-point Likert scale, ranging from strongly disagree to strongly agree (one to seven points).The evaluation items were categorized into the spatial awareness [72], degree of autonomy [50], museum experience [51], and interface usability [20,62].
First, We hypothesized that the map interface would affect the participants' awareness of the museum space and enhance spatial navigation [12].Thus, three items from Witmer & Singer's presence questionnaire [72] were selected and tailored to the context of the experiment: spatial presence (perceived physical presence within the virtual space), sense of reality (the authenticity of their experience by indicating to what extent they felt they were genuinely walking through and exploring the virtual exhibition space, akin to their experiences in a real-world setting), self-orientation (ability to easily locate and navigate within the exhibition space) in the museum space.Building on Lynch's concept of imageability [39], participants were also evaluated on imageability (how well they comprehended and vividly envisioned the virtual exhibition space).The degree of autonomy was measured by factors in the sense of agency scale [50,65] such as identification(a sense of identification with the videographer), sense of agency(the perception of agency, and the ability for autonomous exploration throughout the entire experience), and perceived interactivity(how many actions participants could take while navigating).The overall museum experience, evaluated through the Museum Experience Scale (MES) and Multimedia Guide Scale (MGS) [51], encompassed enjoyment (the inherent appeal derived from the video content itself), satisfaction (overall contentment with the experience), engagement (the extent of immersion in the virtual exhibition facilitated by the provided interface), and understanding of contents (comprehension of exhibition content).Finally, in terms of the measure of interface usability, it was evaluated in three items based on the study of investigating user experience of on-screen interaction [62]: ease of use, intuitiveness, and natural mapping.The survey also included the NASA-TLX test to check the designed interface's perceived task load compared to the existing interface [20].

Interview Session.
We conducted semi-structured interviews about the characteristic behaviors and speech of the participants observed during the exhibition, using pre-prepared questions about the overall experience.In the T+M condition, additional questions were asked about map interface usage, such as how each element in the interface affected navigation and how the directions were selected at crossroads.

RESULT
We collected 24 observation data recorded as video, 24 log data (timestamp, interface function) recorded as text, and 24 experience survey and semi-structured interview data recorded as text and audio.All data were coded and analyzed using keywords.We performed the Mann-Whitney U test on quantitative survey data measured on a 7-point Likert scale in 2 conditions (see Figs. 7, 8,  and 9).Finally, in Fig. 10, the observation and log data were used to compare participants' behavioral responses and verbal expressions with their actual inputs for control.

Survey
5.1.1Spatial Awareness.Compared to the T condition, the T+M condition showed a statistically significant higher score for a sense of reality (U = 36, p < .05),self-orientation (U = 38, p < .05),and imageability (U = 37.5, p < .05).The score of spatial presence did not show any statistically significant difference between the two conditions; both conditions had positive scores (see Fig. 7).This result was clarified through brief interviews conducted after the survey.Participants responded to questions about the sense of reality by addressing the realistic representation of bodily movements in manipulation.At the same time, spatial presence was associated with the feeling of standing within the exhibition space.Both groups of participants, exposed to prototype video content featuring the actions of the videotaker, such as opening a curtain to enter the main exhibition area and ambient sounds like murmurs of other visitors, reported a genuine sense of being present in the exhibition space.
While viewing the video, the participants in the T+M condition were more aware of where they were located and where they were heading than those in the T condition.P13 expressed that he felt like searching for the route when holding the exhibition brochure for a real exhibition experience.In addition, he mentioned, "Because the map draws a picture in my head, I think it was much easier to revisit (to see what I wanted to see again), " indicating that the map helped to understand the overall layout of the exhibition space.In a similar vein, P21 also mentioned that he was able to make his own time plan and watch the exhibits in the planned order since he could recognize his real-time position and remaining exhibits in the map interface.In the case of P22 and P24, they mentioned that they were able to discern their future movement path or current trajectory through the map, which contrasted with the past videoviewing experiences where it was difficult to concentrate due to the inability to ascertain their current location.

Identification
Sense of Agency Perceived Interactivity * *

Autonomy.
The participants in the T+M condition showed statistically higher values in perceived interactivity (U = 38, p < .05)and sense of agency (U = 18, p < .05)than those in the T condition (see Fig. 8).The comparison of identification levels between the T and T+M conditions revealed no statistically significant difference.Participants in the two conditions exhibited distinct patterns of identification.Those in the T condition primarily attributed their sense of identification to the first-person perspective and the subtle screen shaking (P5, P7, P15, P23).Conversely, participants in the T+M group reported identifying with the videographer's actions and viewing themselves as the operator while using the map interface (P14, P18).Overall, it was confirmed that the use of the map interface triggered viewers' cognitive changes, making them feel that they were watching autonomously.Participants in the T condition evaluated that the range of operation was small because they could not interact on their own other than turning their view direction.Accordingly, they could not watch the exhibition as intended.P6 responded, "Actually, it felt like I was on the ride.It felt like the route was fixed, and only the view direction could be turned freely, " indicating that P6 accepted the viewing experience passively.In addition, this passive attitude influences interaction.P1 mentioned, "This moving person is a guide and guiding me to the exhibition.Therefore, I rely on myself for this guide's movement because there were few view direction manipulations." In contrast, participants in the T+M condition responded that they seemed to enjoy the exhibition as they wished, despite knowing that they were watching a pre-recorded video.For instance, P13, who responded that he usually enjoys the exhibition with a passive attitude such as following the order and guidance, showed his willingness to explore actively along with going back to a specific location through the map.We also confirmed that choosing a route at an intersection can induce active and autonomous viewer choice.P20 said, "If it was a normal video, it would have felt like riding a horse, but it felt less because I was given a choice."In addition, the expanded manipulation made participants concentrate on the exhibition as they played a game.P17, P18, P20, and P23 responded that they felt the same autonomy as manipulating characters in the game because the maps were reminiscent of FPS (First-person shooter) games.

Viewing Experience.
In terms of the overall user experience (see Fig. 9), the items that showed statistically significant differences between the two conditions were satisfaction (U = 37.5, p < .05),engagement (U = 25.5, p < .05),and understanding of contents (U = 33.5, p < .05).While satisfaction evaluated the overall experience, enjoyment measured the pleasure derived from the content.Therefore, in contrast to satisfaction, no significant difference in enjoyment was observed between the two conditions.As mentioned in 4.1, none of the participants had experienced a virtual exhibition through a 360 video.Accordingly, they responded that borrowing someone's attention and walking in the exhibition space from the first-person view was a new and enjoyable experience.However, low operability with the timeline interface seemed to decrease satisfaction and engagement.For instance, P4 mentioned, "I think it's better to see a prettier picture than to watch an exhibition like this (video).Because I cannot interact with the exhibits in both ways, I prefer to look at a picture for a clearer view rather than watching a video.
In terms of engagement, participants in each condition appeared to have different experiences.Participants in the T condition primarily reported a sense of immersion not derived from their actions but from viewing the exhibition space through the perspective of the videographer within the video.They described experiencing the enjoyment of assuming or aligning with another person's viewpoint.In contrast, participants in the T+M condition, who used the map interface, perceived the act of viewing the exhibition as their own experience.They regarded the experience of being the agent of choice as satisfying (P13, P19, P24).Furthermore, some participants (P14, P15, P16, P18) expressed satisfaction with slightly limited but smooth navigation, comparing it favorably to the fully autonomous navigation required in virtual exhibitions they had experienced previously.
For instance, one participant (P6) stated, "I have seen virtual exhibitions where, similar to Google Maps Street View, I need to click to navigate.That kind of exhibition allowed me to view the exhibits in the order I desired, but at the same time, the move everywhere with mouse clicking made me feel unnatural, difficult to be immersed and somewhat uninteresting.However, with this (video-based virtual exhibition) I could move wherever I want unlike in general video contents, and also I felt much more natural  2 and 3, the result showed no significant difference between the two conditions regarding interface usability and task workload.This result implies that the proposed map interface did not interfere with the overall video-viewing experience and that there was no particular cognitive difficulty in exploring remote spaces with them.Specifically, participants responded that the composition or location of the map interface did not visually interfere with the experience.

Difference of Video
Control: Passive vs. Active.While watching the video exhibition, the same navigation functions were provided for both conditions but with different representations.The different attitudes (passive or active) toward accepting the experience of participants in two conditions revealed in the statistical results are related to the method of controlling the video.Although there was no significant difference in the frequency of function usage between the two conditions as shown in Table .1, it seems that the log data implicitly illustrate the differences of used functions in each group.For example, participants in the T condition mostly used 'play/pause' functions to avoid missing the flow of the video.In contrast, in the T+M condition, the videographers' path (e.g., speed, view direction) was actively manipulated using more diverse functions (see Fig. 10).They expressed the joy of navigation according to their intentions and interests.
We found more details about these tendencies from the interviews.The passive attitude of participants in T conditions is based on the satisfaction they feel as if they are overwhelmed by the space delivered to them through the videographers' view.They mentioned that they tried to avoid using speed control('faster' and 'slower') or 'skip' functions to prevent them from interrupting their immersion (P2, P3, P6, P7, and P8).In particular, they expressed that the continuous flow provided by the video content itself could be disrupted through their manipulation.For example, P6, who did not use any interface function while watching the video, mentioned, "I really liked the feeling of moving.It felt very live, but the moment I pressed the pause button, I thought this flow would turn into a piece of photo."  3: Statistical analysis results on task workload between two conditions, T and T+M.

Ease of Use Intuitiveness Natural Mapping
In contrast, the function most frequently used by the participants in the T+M condition was 'faster,' and other functions were used more diversely than the T group.As shown in the survey results, it seems that they gained satisfaction from the autonomy they felt while manipulating the footsteps of the videographer.For instance, P16 mentioned, "It was fun to accelerate when the artwork was seen from a distance, and slow down when it came close."and talked about the pleasure of manipulating the moving speed (video speed) as intended.In addition, P19 expressed satisfaction by describing navigation as "a feeling of walking as fast or slow as I want."P21 and P22 mentioned using the functions differently depending on their degree of interest and demonstrated that they used 'slower' and 'play/pause' functions according to their intentions as the realworld movement.

Comparison with
Real-world Behavior.From the interview results, we found that both groups compared their behavior in a real-world exhibition with their behavior in a video exhibition.The participants in the T condition mainly mentioned what behavior could have been done if they were in a real-world exhibition and how their actions differed in the video-based exhibition.In contrast, participants in the T+M condition mainly mentioned that their usual habits in the real world affected their behavior while viewing the video-based exhibition.For instance, P13, who repeatedly controlled the video speed, mentioned that he passes and skims exhibits and focuses only on specific exhibits in which he is interested.In addition, some participants (P17, P19, and P21) responded that they felt that the acceleration and deceleration functions were as if they were adjusting the walking speed in the actual exhibition.P18 expressed satisfaction more specifically, "It was nice to see where I was naturally in the process of moving because it shows continuous frames."

DISCUSSION
In this study, we designed the Viewer2Explorer interface that allows viewers to browse the virtual exhibition contents by extending the operating range of 360 videos from temporal to spatial navigation.Based on the user test results, we discuss the effect of the Viewer2Explorer interface on the user's exhibition viewing experience in three features: Spatial Awareness, Assisted autonomy, and Dual immersion.

Enhancing Spatial Awareness in 360 Video
Exploration: The Impact of Viewer2Explorer's Map-Based Interface Design Conventional video interfaces are structured around temporal operations, enabling users to navigate through scenes using a timeline and time-based functions, including playing, stopping, fastforwarding, skipping, and rewinding.These designs optimize efficient 'browsing' experiences, reducing seeking latency and facilitating users' quick access to desired time points within the video content [5].Various strategies, such as incorporating thumbnails with a complete 360-degree view, have consistently enhanced users' understanding of the entire video flow, particularly in 2D screenbased desktop environments [36,47,59].
In contrast to traditional video interfaces, the Viewer2Explorer seeks to offer a more effective spatial navigation experience within 360 video content.Rather than prioritizing browsing efficiency, it is designed to facilitate the exploration of the spatial dimensions inherent in 360 video content.To address our research question (RQ1), focusing on how Viewer2Explorer's map-based interface supports users' spatial awareness and exploration in 360 video-based content, we conduct a comparison with the widely used conventional timeline video interface.Validation of the Viewer2Explorer's map interface is achieved through a user study, demonstrating a substantial improvement in viewers' spatial awareness within the virtual exhibition space represented by 360 videos.This enhancement is particularly evident in orientation and positional understanding.Participants in the T+M condition reported an increased sense of direction, with the map and indicator contributing to an overall improvement in spatial awareness related to the exhibited content.
Comparable to the conventional timeline interface, where users click on the timeline to navigate to desired time points, Viewer2Explorer users can traverse to their desired location by interacting with colored paths in the map interface.Furthermore, participants manipulated functions specifically adapted for spatial movement rather than those designed for temporal navigation.While there was no statistically significant difference in function usage between the two groups, interviews revealed distinct perceptions of the same virtual exhibition.Participants in the T condition used video functions to avoid missing the videographer's movements, adopting a relatively passive viewing approach.In contrast, those in the T+M condition actively controlled the videographer's orientation, position, and movements using various video interface functions.They expressed a focus on intentionally manipulating the videographer's steps according to their interests, incorporating continuous speed control as if regulating the pace of steps in the real world.Additionally, they emphasized their spatial positioning by identifying themselves with the videographer.
While traditional VR experiences prioritize realism and embodiment, our approach, involving the stitching of multiple 360 video clips, endeavors to replicate spatial movement without necessitating full 3D representation.This methodology, integrated with the map interface and 360 video-based content, demonstrates the potential to broaden the viewer's manipulation capabilities beyond simple viewpoint adjustments and time point shifts, providing an elevated degree of freedom for spatial manipulation within the virtual environment.Moreover, this integration holds promise for enhancing user engagement and enjoyment through active exploration when users interact with a virtual space within a desktop environment.

Assisted Autonomy and Spatial Exploration in Virtual Exhibitions
The graphical interface elements in Viewer2Explorer, such as colored paths and coins, were designed to assist users in navigating the video by assisting them understand the composition of the exhibition and the location of the exhibits.From the result, we found that these visual elements eventually assisted participants in remembering the paths they already visited and motivated them to explore the entire exhibition.In addition, most participants described that they perceived the different colors of each path as different themes within the exhibition content.They interpreted the paths' order as the exhibition's original order.Most participants followed the original exhibition order in the T condition using the timeline interface.On the other hand, in the T+M condition of using Viewer2Explorer, participants tended to explore the exhibitions they wanted to visit.These results verify that Viewer2Explorer enables viewers to explore virtual exhibition content uniquely, facilitating more independent learning within the virtual exhibition space and enabling diverse experiences through personal interpretation.
The concept of autonomy is defined as the ability of the experiential subject to have self-agency, allowing them to have control and unfold experiences according to their intentions.Autonomy involves the learner choosing to learn through self-selection [37], experiencing satisfaction in acting according to their intentions within the virtual environment [70], and feeling connected to the environment, thereby contributing to learning motivation [22].Previous research in GLAM virtual exhibition has attempted to provide viewers with autonomy by offering choices in unfolding the narrative, such as selecting where to go next [1,2], or employing avatars' viewpoints to proceed through virtual tours [34].However, despite granting a degree of autonomy, these approaches guide viewers to some extent, as they adhere to predetermined storylines, limiting complete autonomy due to the nature of the medium and focusing on the viewer.
In previous attempts [1,2,34], viewers experienced a sense of control over their movements, even if the autonomy provided was not highly self-determined.Viewer2Explorer, in contrast, places emphasis on broadening the manipulation methods to offer users a richer set of experiences.Viewer2Explorer provides assisted autonomy through visual elements like maps and coins, allowing viewers complete control of the museum experience.These features are analogous to how visual and audio cues maintain a person's awareness of their location and orientation in a virtual environment [13].We confirmed that viewers used the interface components in Viewer2Explorer to explore the exhibition space through the experiment.The experience differed from following the videographer's path or drifting aimlessly.The approach in Viewer2Explorer resembles the traditional museum experience, where viewers can explore chosen works and learn at their own pace within a well-organized exhibition space [31].The arrangement of exhibits affects visitors' gaze movements, exploratory attitudes, and interactions with exhibits [71].
In RQ3, we inquired about how the integration of Viewer2Explorer with virtual exhibitions based on 360 videos influences users' experiences, specifically in terms of altering perceptions and interactions, when viewing the same video exhibition.Based on insights in terms of assisted autonomy, it seems that virtual exhibition designers can effectively convey their messages with predefined content while encouraging viewers to explore the virtual space with their intentions.For example, the virtual exhibition content can be divided into two sections: the section where users must follow the video flow and the section where users can explore freely.These sections can be differentiated in the interface by limiting the manipulation features or using color coding so that users can distinguish when they should focus on the streamed content and explore and gather information more actively.By implementing this strategy, it would be possible to provide an experiential learning opportunity about the exhibits through exploratory activities that directly engage with the virtual exhibition space.

Dual Immersion
In this research, we aimed to investigate how Viewer2Explorer influences users' active engagement and exploration in 360 video virtual exhibitions compared to linear timeline interfaces (RQ2).The study results demonstrate that expanding the range of interaction in relatively passive media can encourage autonomous exploration and engagement among viewers.In user experience evaluations, participants in the T+M condition group became active explorers, proactively interacting with the video exhibition.They engaged in control over the videographer's orientation, position, and movements.Participants in the T+M condition immersed themselves in manipulating the videographer's movements, akin to controlling a character representing themselves in the virtual space.Consequently, it is confirmed that a higher degree of freedom provides satisfaction compared to the conventional timeline interface.
Indeed, viewers utilizing the map interface could not fully control the videographer's movements due to the restricted manipulation within Viewer2Explorer in 360 videos.Unlike 3D virtual environments where users can manipulate every gaze and movement, the videographer in Viewer2Explorer would persistently advance in their walking and viewing direction, except when the viewer explicitly stopped the video.This not fully autonomous navigation, however, facilitated a transition between passive and active engagement despite not affording complete control over the predetermined videographer's movement.Calleja et al. categorize the term "immersion" into two forms: immersion as absorption and immersion as transportation [4].
Immersion as absorption involves the player becoming involved in the medium, getting absorbed into specific situations or actions.On the other hand, immersion as transportation entails players navigating through the virtual world, assimilating into that world, and interacting with entities within that shared virtual space.Our user study revealed that the imperfect and restrained spatial navigation with 360 videos concurrently induced these two different types of immersion in viewers.Participants in the T+M condition felt they could manipulate their viewpoint and body movement as desired, similar to 3D environments.Despite not having direct control over every step, the continuous progression of the video allowed them to move easily to desired locations without excessive cognitive or physical burden.Consequently, 360 video viewing through the map interface enables viewers to shift between active and passive interactions, offering a nuanced spectrum of media engagement experiences.In this paper, we label "dual immersion" to characterize how viewers experience absorption and transportation interchangeably through active and passive interactions.
Active manipulation through a virtual tour, where the user physically delves into the environment, allows for a sense of transportation and self-discovery of unexpected elements [42,45].In contrast, video-watching formats of tours involve the environment itself approaching the viewer, and the viewer passively absorbs into the narrated story driven by the experience [3,52].Watching a video typically requires minimal user interaction, fostering a "lean-back" experience where viewers passively consume content with little cognitive effort [3].Consequently, traditional video interfaces prioritize presenting the world from the videographer's perspective rather than actively engaging users.On the contrary, virtual museums employing 3D environments and panoramic images aim to enhance visitor experiences by providing a high degree of autonomy in navigating the virtual exhibition space.Such virtual museums enable visitors to explore the virtual space independently, offering the satisfaction of freedom of movement [63].However, excessive freedom in expansive spaces can overwhelm visitors, impeding motivated exploration and diminishing interest or causing disorientation [9,10,35].Additionally, user study interviews revealed that interfaces like Google Street View, where viewers must control every movement, may feel unnatural and hinder immersion due to the burden of controlling every step in the virtual environment.
These findings highlight how immersion experiences vary depending on the level of autonomy the medium provides.Previous efforts have sought to overcome the limitations of each medium.For example, interactive video formats give viewers choices to increase autonomy and engagement.In contrast, studies of virtual exhibitions in 3D spaces have sought to balance levels of interaction, allowing users to maintain interest without overwhelming them [8,16].In video-based virtual exhibitions, Viewer2Explorer incorporates continuous time flow through 360-degree videos and the integration of map interfaces.It offers users the freedom to explore curated spaces.This balance between autonomy and predefined content has significant implications for designing immersive virtual exhibitions and other content consumption experiences.It highlights that increasing the level of interaction within video-based experiences can create a distinctive dual-immersion experience.

Limitation and Future Work
The study has several limitations.First, we only used historical art exhibitions in our user test, which may not be representative of other areas such as science.Further studies should investigate how Viewer2Explorer affects other domains.In addition, it is necessary to explore the impact of the interface on 360 video content beyond exhibitions, such as remote tours [34] or telepresence systems [25,30].Second, the total length of the source video was limited to about five minutes, which is relatively short.Since the user might behave differently depending on the video length and size of the map, it should be explored in the future.Next, the user experience was evaluated based on the results of the surveys and interviews.To enable quantitative comparison and analysis of user experience and behavior in future studies, real-time sensory data, such as eye tracking, and mouse clicks can be collected during navigation through a map interface.
Furthermore, with the different ways of viewing video content available today (e.g.mobile, tablet PC, VR HMD), it is necessary to consider different factors.In this paper, we have focused on the 2D screen-based interface.However, it is crucial to design the interface components and apply interaction methods according to the device, such as attention guidance [19] or input modalities [29].For example, mobile interfaces require touch interaction and may need a zoom-in/out function due to screen size limitations.In addition, VR HMDs offer various input methods such as controllers, hand tracking, and gaze.Thus, additional refinement of the Viewer2Explorer interface is needed to provide a more immersive and guided autonomy experience on the alternative display platform.

CONCLUSION
This study attempted to expand the previous timeline-based video virtual museum exhibition experience for an active space exploration experience.We aimed to convey the joy of exploring space through nonlinear navigation without passively accepting video experiences.Our developed Viewer2Explorer interface and user study verified the effectiveness of the map interface and viewers' experience.We found three major findings from the experiment.First, the interface successfully induced both active and passive immersion video viewing experiences.Secondly, viewers unconsciously followed the autonomy guide designed by the exhibitor and videographer.Lastly, we found that viewers applied their real-world behavior to their video interface usage.
Overall, these results confirmed that the map interface improved viewers' spatial awareness and created a satisfactory museum experience through active navigation.Furthermore, by transforming the range of video operations with the map interface, we demonstrate that 360 videos can be expanded into space search media through immersion and guided autonomy.In addition, we provide design guidelines for implementing spatial exploration in the virtual exhibition experience through a map interface.
Owing to the pandemic, museums have built their own VR applications to transfer onsite experiences [27,68].However, 360 videobased virtual exhibitions could be more efficient with our proposed minor modifications.For instance, because the Viewer2Explorer interface is executed on a 2D screen video without an expensive HMD, it can be easily distributed and accessed through the web.In addition, it could expand the temporal-based experience of previous video formats to a spatial experience as a VR platform.

Figure 1 :
Figure 1: (a) Still cut of 360 museum exhibition video screen with time + map interface and (b) close-up image of the map interface for spatial navigation.

Figure 2 :
Figure 2: Overall road map of design considerations applied to the View2Explorer interface.

Figure 3 :
Figure 3: The exhibition is divided into four separate paths based on the themes of its content.These paths are then combined to create a single 360-degree video exhibition.The paths are distinguishable by different colors and are indicated on a map.

Figure 4 :
Figure 4: Overall structure of the map interface and three main elements: viewer indicator, coin, and color-coded path.

Figure 5 :
Figure 5: Illustration of the layout and functions used in each interface, Timeline and Map + Timeline interface.Row 1: Graphic User Interface (GUI) layout.Row 2: Position control interface.Row 3: The Play/Pause buttons are the same in both GUIs.Row 4: The Speed Control was implemented with up and down buttons on the keyboard.Row 5: Skip illustrates two different functions implemented in two interfaces.

Figure 6 :
Figure 6: (a) Illustration of the experimental setup and (b) an image of a participant conducting the user test.

Figure 7 :
Figure 7: Rating scores of spatial awareness in T and T+M conditions.It shows a statistically significant difference (p < .05) in the sense of reality, self-orientation, and imageability.The error bars represent the 95% confidence intervals.*p < .05.

Figure 8 :
Figure 8: Rating scores of autonomy in T and T+M conditions.Statistically significant differences (p < .05)were observed in the sense of agency and perceived interactivity.The error bars represent the 95% confidence intervals.*p < .05.

Figure 9 :
Figure 9: Rating scores of experience in T and T+M conditions.It shows statistically significant differences (p < .05) in satisfaction, engagement, and understanding of content.The error bars represent the 95% confidence intervals.*p < .05.

Figure 10 :
Figure 10: Illustration of logged data from participants in T and T+M conditions: the average frequency of each function usage (left) and log of event marks of each function triggered when viewing the video (right).

Table 1 :
Descriptive statistics (Mean, SD) of each function usage and comparison results between T (n=12) and T+M (n=12) conditions group using paired T-test.andfarmore enjoyable than those click-to-navigate virtual exhibitions."Anotherparticipant (P12) mentioned, "Unlike typically virtual exhibitions requiring to move step by step to navigate, I could explore the space without much effort to control because the video automatically takes me where I want to go."5.1.4Interface Usability and Task Workload.As shown in Table

Table 2 :
Statistical analysis results on interface usability between two conditions, T and T+M.