Understanding User Immersion in Online Short Video Interaction

Short video~(SV) online streaming has been one of the most popular Internet applications in recent years. When browsing SVs, users gradually immerse themselves and derive relaxation or knowledge. Whereas prolonged browsing will lead to a decline in positive feelings, users continue due to inertia, resulting in decreased satisfaction. Immersion is shown to be an essential factor for users' positive experience and highly related to users' interactions in film, games, and virtual reality. However, immersion in SV interaction is still unexplored, which differs from the previously studied scenarios essentially because SV delivery is fragmented, discrete, and with limited time for each video. In this paper, we aim to make an extensive understanding of user immersion in online short video interaction, include related factors, detecting possibility, and satisfaction representation. We conduct a three-step user study on real SV browsing, including an online survey, a field study, and a lab study with EEG signals. The user study reveals that immersion is a common feeling in SV interaction, and it is related to video features, personalization of recommendations, user mood, and interaction behaviors. Specifically, prolonged browsing leads to a significant decrease in immersion. Furthermore, analyses of EEG signals demonstrate that the prefrontal lobe and parietal lobe of the gamma band are associated with immersion. Besides, immersion prediction experiments achieve encouraging results, showing that user immersion status is predictable and EEG signals do help improve prediction performance. Moreover, correlation analysis indicates that the predicted immersion is more representative of user satisfaction than user behaviors, revealing the potential of immersion as an indicator of satisfaction in the recommender system. To the best of our knowledge, it is the first study on user immersion in real online SV interaction scenarios, and our findings are enlightening for SV users and recommender system designers.


INTRODUCTION
With the development of the Internet and mobile media, online short videos (SVs) have become an important medium for people to access information, relax and entertain.SVs are usually continuously delivered to users with recommendation systems.When browsing SVs, users are attracted, gradually immersing themselves in the content and deriving relaxation or knowledge.After a period, they may start to feel numb and experience positive feelings fade, yet they may continue browsing out of inertia without actively pausing, which leads to a decrease in satisfaction.Generally, users desire to be reminded when the positive engagement declined to get a satisfying browsing experience.
Users's immersion plays an important role in this process.Immersion is a positive psychological experience that helps users get involved in SV content.Michailidis et al. [28] illustrated that immersion status under control will improve the efficiency and effectiveness of knowledge acquisition and is related to a relaxing and enjoying state.Therefore, immersion is an essential factor for people to acquire knowledge and satisfaction in SV browsing.
To better understand user immersion during SV interactions, we need to measure it first.In psychology, scales of portraying psychological feelings [8] are usually used to quantitatively analyze immersion.However, the scales are an intrusion for the users to navigate through the experience and usually have to be filled in by the subject by recall after the experience.On the other, electroencephalogram (EEG) provides a new way to measure immersion and compensates for the limitations of scales, as EEG is able to collect real-time data on users' status.Additionally, EEG data are rich in spatial, temporal, and frequency band information about human experience [34] and can be used to investigate underlying neural mechanisms of immersion.Therefore, we combine scales and EEG to gain an in-depth understanding of user immersion in SV interactions.
Given the importance of the immersion state in user satisfaction and suitable measurement, we aim to understand user immersion status when they browse SVs.There have been studies on immersion in films [35], games [18], and virtual reality (VR) [2].However, their findings and methods cannot be directly applied to SV streaming due to its fragmented nature of browsing through discrete information.Additionally, SV browsing primarily involves passive information reception, distinguishing it from games.To our best knowledge, it is the first in-depth study on user immersion in online SV interaction scenarios.In this paper, we try to answer the following three research questions: RQ1.Does immersion occur while interacting with the online short videos, and what are the highly related factors?
RQ2.How are brain signals active while browsing short videos associated with immersion?
RQ3.Is user immersion predictable with EEG-based brain signals?If so, can the predicted immersion be a promising measure of user satisfaction?
To address the questions, we conduct a three-step study on SV immersion (a broad user survey, a field study, and a lab study with EEG signals) with ethics approval and privacy protection.
For RQ1, we aim to anatomize the phenomenon of immersion in the SV scenario.We design a scale to measure immersion for SVs interactions.We find that video and audio features, personalization of recommendation, user mood, and user behaviors are related to immersion.It's worth noticing that immersion increases with browsing time at first (generally 1-2 hours), but continuous browsing brings a significant reduction of immersion then, which is corresponding to the above-mentioned changes in browsing status.
We collect EEG signals in the lab study to answer RQ2.We find that high immersion is associated with the frontal lobe and parietal lobe of the gamma band.Meanwhile, preference, personalization, and mood affect the perception of immersion in brain signals.
As for RQ3, we conduct experiments to predict immersion.Adding EEG significantly improves the prediction performance, demonstrating the potential of EEG signals in SV interactions.Moreover, the predicted immersion has a higher correlation with user satisfaction than behavior and mood metrics, which shows the possibility of leveraging immersion as an indicator of satisfaction in the recommender system.
The main contributions of this work are as follows: • This is the first study on short video interaction immersion with EEG in real online scenarios.Immersion is commonly experienced in short video interaction, and there are key influential factors, including short video features, personalization, user mood, and interaction behavior.• We reveal that EEG has a significant correlation with immersion in parietal and frontal lobes in gamma waves.Moreover, the user immersion status is predictable with EEG signals, which suggests that EEG helps recognize and predict user immersion.
• We demonstrate the possibility to leverage the predicted immersion as an indicator of user satisfaction in recommender systems, as it is more representative of users' satisfaction than commonly used behavior metrics and emotion labels.

RELATED WORK 2.1 Previous Study on Immersion
From the aspect of positive psychology, when consuming media (e.g., computer games, films, and virtual reality (VR)), people experience immersion [6,16].Psychologists commonly utilize scales to measure the degree of immersion reported by users.Jennett et al. [18] developed a scale specifically for measuring immersion in games, which includes dimensions of losing track of time, detachment from the real world, and a deep sense of engagement in the virtual environment.In the realm of video media, a scale called 'the film IEQ' [35] has been designed to assess immersion.This scale comprises captivation, real-world dissociation, comprehension, and transportation factors.However, due to the differences in context, scales cannot be easily transferred.These studies have not explored the concept of immersion specifically within online short video streaming.
Considering the neural correlates, immersion requires increased mental effort in a task, thus functionally evolving over time [28].Concentration is a key aspect of immersion, and EEG and fMRI studies have shown activation in the frontal and parietal cortices during concentrated states [38].Lim et al. [22] defined concentration as focusing on a red dot and immersion as playing a computer game, revealing decreased alpha waves and increased theta waves.Visch et al. [42] examined immersion in film and found a positive correlation between immersion and intensity of emotions.
There is limited research on immersion specifically focused on SVs.The closest study was conducted by Su et al. [39]who investigated neural activity using functional MRI while participants watched SVs.However, due to environmental constraints, the videos were pre-recorded and played during the formal experiment, resulting in a significant gap from real-life scenarios.
In summary, immersion has been studied in VR, gaming, and films.However, few studies have focused on the SV context.Our study is the first to investigate immersion in the context of SVs in real-life scenarios.With reference to previous immersion scales [18,35], we design the online SV immersion scale 1 .Distinct from previous work, we add field studies to capture immersion in real scenarios.Through a comprehensive three-step experiment, we gradually gained a deep understanding of immersion in SV interactions.

Study on User Behaviors and Experiences in the Personalized Recommendation
Click behavior has been widely used as essential implicit feedback in recommender systems and is been taken as the indicator of user preference.However, user clicks may not always accurately reflect their actual experiences and satisfaction, leading to potential biases [19,23].To address this issue, dwell time has been employed.Some work uses the user's dwell time to recommend in TV programs [46], hotels [26], and online news streaming [24].
In addition to user behavior, researchers recognize the importance of modeling users' subjective experiences, such as satisfaction, when developing new recommender systems [37].There is a significant gap between clicks and user satisfaction [25,44].This disparity has motivated researchers to explore factors that influence user satisfaction, such as quality, entertainment value, and usefulness [33].Emotions also play a crucial role in human decisionmaking [31,32] and have become a research variable in contextaware recommender systems.Li et al. [21] proposed a multi-task model to predict music preference with mood prediction as an auxiliary task.Tkalcic et al. [40] proposed a unified framework that locates existing research efforts in a three-stage model of emotions used for recommendations.
To fully capture the user experience, relying solely on click and dwell time observations is insufficient.Emotion is one aspect of the experience, and similarly, immersion is a crucial component of users' interactions within the recommendation system.

EEG in User Interaction Study
A growing body of research uses EEG signals for a variety of applications, such as user input [7] and monitoring the psychology metric behind human behavior [32].Researchers have also explored the utilization of EEG signals in information retrieval, leveraging them as user feedback [14].EEG signals have been utilized to recognize and analyze human moods [1].The impact of entertainment on cognition has also been studied using EEG.Wan et al. [43] measured the impacts of virtual reality games on cognitive ability using EEG signals while Feng et al. [13] investigated the influence of fragmented reading by EEG.
These studies demonstrate the wide-ranging potential of EEG across different scenarios.Consequently, we utilize EEG signals to provide valuable insights into user immersion in online short-video scenarios, as shown in our experiments in the following Section 5.

TASK DEFINITION AND METHODOLOGY 3.1 Definition of Immersion
Immersion refers to the feeling of being highly concentrated in an experience with the following four characteristics: lack of time awareness, a sense of transportation to another reality, captivation, and emotional involvement [29].
A high level of immersion is often strongly associated with flow [18].Flow is a positive experience in which individuals concentrate on an activity and improve their performance and experience [9].Besides, it is worth noting that immersion exhibits distinct physical behaviors and experiences compared to negative experiences such as addiction.Addiction is a disorder of overdependence on things accompanied by a withdrawal response and is closely associated with reward learning in the human brain [17,30].In contrast, immersion describes an engaged experience and is related to attention.A high level of immersion makes a person more productive and more efficient in processing information.
We focus on user immersion in SV interaction.To address the RQs in Section 1, we conduct a three-step study, specifically, a broad user survey on SV immersion, an in-depth field study, and a lab study with EEG signals on immersion in real SV browsing.The survey offers a comprehensive understanding of immersion in SVs.The field study captures immersion in real-scenario, while the lab study introduces EEG to enable more in-depth analysis.The overall process is illustrated in Figure 1, and the research methodology is described as follows.

Step1: Users Survey with Questionnaires for Immersion in Short Video
We designed a questionnaire survey to understand user immersion in SV interaction in general.The questionnaire consists of 3 sections: demographic collection, SV usage habits, and an immersion scale for last SV interaction session 1 .The immersion scale measures the level of immersion of users during an SV interaction session.The scale is comprised of four dimensions: time perception, attraction, emotional involvement, and real-world dissociation.Our survey results verified validity (Conbach's alpha=0.84)and suitability for factor analysis (Kaiser-Meyer-Olkin value=0.86).The respondents were recruited through online social media.294 responses are collected, and 214 of them are valid according to the attention-detection questions.The survey does not involve youth under the age of 18.The gender proportion is male:female=116:98 and the distribution of age is 18-30:30-40:over 40 = 174:33:7.We also interviewed respondents about their SV user experience.84.6% of respondents have been using SVs for more than one year.As for the frequency of SVs usage, 80.4% of people browse SVs every day while 7.0% of people sometimes or seldom browse SVs.

Step2: User Field Study on Daily Life Short Video Browsing
In order to investigate session-level immersion in real-life settings and collect user preferences for the lab study, we conducted the field study.We recruited 30 participants from a public university where 14 were female and 16 were male, aged 18-28 (M=22.17SD=2.20 2 ).All participants are right-handed and do not suffer from any neurological disease.Every participant have signed a consent before the experiment.The field study and lab study shared the same group of participants, who had the option to quit at any time.Upon completion of the experiments, participants received approximately $60 in research compensation.
During the one-week field study, the participants were required to browse SVs on a popular SV platform, where all items (i.e., SVs) were recommended to them by the platform recommender system.They browsed SVs in their spare time for one week, following their habits.The view time and immersion scale were filled out after every viewing session.

Step3: User Lab Study with EEG on Short Video Browsing
To collect video-level data with EEG signals and conducted further in-depth analysis, we carry out lab study.
In the lab study, participants browsed SVs with every session comprising a 15-minute browsing stage and a 10-minute labeling stage.Then, participants had a 5-minute break before proceeding to the next session.Throughout the SV browsing process, participants were allowed to interact with the videos by clicking the 'like' button and swiping away the video at any time during playback.Electroencephalogram (EEG) physiological signals were continuously collected.In labelling stage, an immersion scale and a satisfaction scale were filled out.Besides, participants self-reported the immersion, mood (in terms of valence and arousal), and satisfaction rating from 1 to 5 for every video 1 .
We set up four settings to recommend SVs to participants: nonpersonalized, random, personalized, and mixed.The platform offers 2 M for mean value, SD for standard deviation personalized and non-personalized settings, both influenced by the platform's strategy.For the random setting, we shuffled videos, which were uniformly selected from the platform's video pool and categorized into three groups based on view counts.To ensure diversity and appropriateness, we filtered out 25 videos from each group.We also conducted a mixed setting where a 1:1 random mix of personalized and randomized videos are recommended to participants.Every participant viewed a total of 4 to 5 sessions.For EEG signals, we applied a 64-channel Quik-Cap (Compumedical NeuroScan) with channels placed basing on the International 10-20 system.After the experiment, we pre-processed EEG data by rereferencing to averaged mastoids, baseline correlation, low-pass of 50 Hz and high-pass of 0.5 Hz filtering, and artifact removal.

Ethical and Privacy
This study has been reviewed and approved by the Department of Psychology Ethics Committee, Tsinghua University (THU202118).All users that participate in the study are aware of the whole research process and the usage of the results before the experiments with a signed Informed Consent Form.The users have authorized usage of their interaction data during the experiments.No hurt is delivered by our experiments to the participants.

THE ANATOMY OF IMMERSION IN SHORT VIDEO (RQ1)
In this section, we report the analysis to answer RQ1.Primarily, we show the result of the immersion scale to show the dimension of immersion in SV browsing.Further, we provide a statistical analysis of four factors related to immersion in SV: video and audio features, personalization settings of recommendation, user mood, and interaction behaviors.

Distributions of Immersion in Different Dimensions
Referring to the scale development method [8], the immersion score of the scale is calculated by summing the scores of every item.Based on the survey, we divided the immersion scores into three levels.
To ensure the number of people at each level is the same, we regard scores less than 57, 57-65, and greater than 65 as corresponding to low, medium, and high levels of immersion, respectively.In every immersion dimension (according to Section 3.2), we plot the distribution corresponding to the average score of the dimension at three levels of immersion, low, medium, and high, as shown in Figure 2(a)-(d).
Relatively speaking, high immersion is characterized by a noticeable presence of time perception, engagement, and emotional involvement than medium and low immersion.However, real-world dissociation is not as prominent.This intriguing finding also indicates that strong emotional involvement, attraction, and fluctuations in time perception are closely related to immersion, while the sense of real-world dissociation is less related.

Relations between Immersion and Video
and Audio-Related Factors To find out what kind of video would be more immersive, we recorded all the videos viewed by the participants during the lab study and extracted both audio and video features.Specifically, we compute audio features from ComParE2016 acoustic feature set [36] by openSMILE [11] with the original sampling rate and calculate the average of video features over frames (segmented by second).Then, we conduct feature normalization and calculate the difference between the average feature value for both high and low-immersion groups (shown in Figure 3).
Our analysis revealed that higher shimmer and lower energy, pitch, harmonics-to-noise ratio, and resonant in audio features significantly related to a more immersive experience.Besides, a significant correlation between immersion and Laplace variance, contrast, and color cast of video.
The findings provide insight into the relationship between video content and user immersion in the context of SV browsing.It suggests that specific SVs are not solely responsible for immersion; rather, diverse videos have the potential to evoke immersion.This perspective inspires that recommenders can provide diverse SVs to users, promoting a more engaging viewing experience.

Relations between Immersion and Personalization Settings
During the lab study, we have four settings to show the SVs: nonpersonalized recommendation, random display, personalized recommendation, and a mix of random and personalized videos.We analyze the differences in immersion among the four settings in Figure 4.The non-personalized and randomized settings have a similar score distribution while personalized and mixed settings are  similar.Personalized recommended videos bring higher immersion than random videos.
Non-personalized sessions show a higher frequency of low immersion scores, while scores of 5 are less common, indicating a lower likelihood of eliciting a sense of immersion in users.In contrast, randomized sessions display a higher occurrence of scores around score 3 compared to non-personalized sessions.Its entropy is higher than the non-personalized setting's, suggesting that identifying immersion in a randomized setting becomes a little bit challenging.Personalized sessions demonstrate a noticeable reduction in low immersion scores and a significant increase in high immersion scores, indicating that personalization significantly enhances users' psychological experience of immersion.Similarly, mixed sessions show the same pattern, suggesting that including a certain degree of randomized videos does not substantially impact the overall perception of immersion, thereby providing chance and support for exploring user interests.

Relations between Immersion and Mood-Related Factors
During the interaction, user moods are being influenced, which is potentially related to immersion.Users self-reported valence (negative to positive) and arousal (low to high energy) from 1-5 for every video they viewed in the lab study.The Pearson correlation coefficient between valence and immersion is 0.50 (p-value<0.05) and the Pearson correlation coefficient between arousal and immersion is 0.67 (p-value<0.05).Naturally, we categorized interactions into five groups based on the scores of mood, and the box plots of immersion scores for each group are presented in Figure 5.
We found that positive emotions and high arousal emotions are significantly associated with high immersion.Immersion, as a highlevel cognitive affection, is positively correlated with emotions and offers the inspiring suggestion that when recommender systems cater to user emotions, they could also strive to provide users with positive immersive experiences to a certain extent.User behaviors may reveal their immersion status.As user immersion is dynamically changing along time, we try to explore whether longer browsing time lead to higher levels of immersion.Figure 6 (a) and (b) show the relationship between immersion and duration of continuous browsing time in the broad users' questionnaires and the field study.The survey results show that the longer the browsing time, the higher the level of immersion within a certain browsing time (generally 1-2 hours).However, there is a limit to immersion.Continuous browsing for extended time brings about a significant reduction of immersion.The field study showed the same phenomenon.This means that a longer dwell time does not correspond to a better immersion experience.Therefore, when immersion starts to decline after prolonged browsing, it can be interrupted to prevent a bad browsing experience.

Relations between User Behaviors and Immersion
Commonly, as conveniently accessible video-level behaviors, liking and view ratio (play time divide by video duration) are used as explicit and implicit user feedback, respectively, in recommender systems.As shown in Figure 6 (c) and (d), a higher immersion score corresponds to a higher percentage of liking and average view ratio.
The distribution of view ratio for the immersion scores of 1 and 5 is more concentrated than that for the middle scores.We found a positive correlation between immersion and liking, immersion and view ratio.
Answer to RQ1: Yes, immersion exists in online short video interaction, and it is closely related to short video features, personalization of recommendations, user mood, and user behaviors.Moreover, it is worth noticing that longer browsing durations do not lead to higher immersion.After a period of time, immersion tends to decline when users continue to browse the SVs.

BRAIN SIGNALS WITH IMMERSION IN SHORT VIDEO BROWSING (RQ2)
Immersion is closely associated with psychological experiences and brain activities [34].Electroencephalogram (EEG) provides a real-time and data-rich tool to measure user immersion during SV interaction, which compensates for the deficiencies of posterior scales.Therefore, we endeavor to characterize the EEG manifestations of immersion to gain a deeper understanding of it.Differential entropy (DE) is known as a useful feature in characterizing the complexity and information content of EEG signals [10].After collecting EEG signals in the lab study, we compute Differential Entropy (DE) of five frequency bands (0.5-4 Hz,4-8 Hz, 8-12 Hz, 12-30 Hz, and 30-45 Hz bands) of time series at each electrode.
To compute DE, we first estimate power spectral density (noted as  ( ),  as frequency) using Welch's method [45] (sampling frequency is 1000) which is based on sliding window (window length is 2 divided by the lower bound of the frequency band).After normalization for each band, we calculated DE as follows:

EEG Representation of Immersion
To explore the correlation between EEG and immersion, we compute the Pearson correlated coefficients between the DE feature and immersion of each user-item pair.Figure 7(b)-(f) shows the correlations with significantly (p <0.01) correlating electrodes highlighted (i.e., with white circle).
In the low-frequency band (delta, theta, alpha, Figure 7b-d), immersion has a negative correlation with EEG signals, indicating a relationship with the suppression of low-level EEG activity.Specifically, the alpha band, which is typically associated with a calm, closed-eye state [3,41], demonstrates a significant negative correlation with immersion in the occipital lobe (Figure 7 d), aligning with previous findings.Moreover, existing research has highlighted an increase in delta wave activity in the frontal lobe during unconsciousness [27].Our observations reveal a significant negative correlation between immersion and delta wave in the frontal lobe (Figure 7 b), suggesting that immersion represents a psychological state in contrast to unconsciousness.
The gamma band has been identified that is important for learning and memory processes [15]and has also shown correlations with meditation [12].We found that immersion in browsing SVs was significantly positively correlated with the gamma wave in the prefrontal lobe and negatively correlated with the wave in the parietal lobe (Figure 7 f).The gamma wave range in the prefrontal lobe is associated with emotional perception, evaluation, and responses [47], indicating a close relationship between immersion and emotional experiences.The parietal lobe plays a crucial role in attentional control and cognitive regulation [5], and the specific relationship is highly complex.The significant relationship between the sense of immersion and gamma wave activity in the parietal lobe suggests the involvement of higher-level cognition during immersion.
In summary, high immersion is associated with the suppression of low-frequency bands, increased activation in the frontal lobe, and lower energy in the parietal lobe of the gamma band.Monitoring the differential entropy (DE) signals at these specific locations can offer valuable insights into the user's immersion state, thereby showcasing the potential of EEG for predicting immersion levels.In Section 4.4, we found the positive correlation between immersion and mood.To explore how moods influence immersion perception, we divided interactions in the lab study into two groups based on valence and arousal of mood, and investigate EEG and immersion in each group.Figure 8 illustrates the Pearson correlations between immersion and DE of electrodes in frequency bands.

EEG Associated to Immersion with Mood
Since high-frequency brain waves (beta and gamma bands) are associated with cognition and psychological experiences, we focus on these bands in the subsequent analysis.
For both positive and negative moods, immersion showed a positive correlation with the prefrontal lobe in the gamma band (Figure 8  b,d).In positive mood, immersion was significantly positively correlated with the beta and gamma band (Figure 8 c,d), which showed a non-significant negative correlation in negative mood (Figure 8 a,b).Notably, immersion was more easily captured by EEG signals in positive mood.
Immersion was significantly negatively correlated with beta and gamma bands in parietal lobe in low-arousal mood.The parietal lobe is associated with high-level cognitive processes such as external spatial attention [4].This suggests that immersion in calmness may pose challenges in eliciting perceptions of the external spatial environment.In high-arousal mood, immersion is significantly positively correlated with the prefrontal lobe as well as with the visually related occipital lobe.It is speculated that high immersion in high-arousal moods is closely related to the visual cortex and emotion regulation.

EEG Associate to Immersion with Personalization and Preference
To find out how personalization affects the relationship between EEG and immersion, we compute the Pearson correlation between the DE and immersion of each interaction grouped by personalized and non-personalized settings (Figure 9 a-d).Personalized and nonpersonalized videos show positive correlations in the prefrontal lobe of the gamma band (which is associated with emotion regulation).This correlation is particularly significant in personalized settings.Immersion and DE in high waves are positively correlated in personalized scenarios, whereas a significant negative correlation is observed in non-personalized scenarios.This phenomenon, to the best of our knowledge, has not been extensively studied in the field of neuroscience, and further investigation is warranted to understand its underlying causes and implications.The behavior of liking shows the user preference for a video.So we conduct correlation analysis under the groups of like and view without like (Figure 9 e-h).The prefrontal lobe of the gamma band was positively correlated with immersion regardless of whether the video was liked or non-liked, presumably with a consistent pattern of immersion and mood-related regulation across preferences.When viewing liked videos, immersion was positively correlated with DE in multiple brain regions within the high-frequency band, specifically, significantly in the beta and gamma bands of the parietal and prefrontal lobes.However, these brain regions did not show significant activation during the viewing of non-liked videos.Thus, we can assume that the neuronal activity of immersion is better perceived when viewing videos that induce liking.
Answer to RQ2: Brain signals, especially prefrontal and parietal lobes, are significantly associated with immersion when users browse short videos, and the association is influenced by mood, recommendation settings, and user preference.High immersion inhibits DE in low-frequency bands and is positively correlated with the prefrontal lobe (related to emotion regulation) and negatively correlated with the parietal lobe (associated with spatial attention) in the gamma band.The correlation between immersion and EEG is influenced by mood.The positive correlation between the prefrontal lobe in gamma band and immersion shows consistency between preference and personalization.

EEG-AWARE IMMERSION PREDICTION (RQ3)
Previous sections show that video features and user interactions are important factors to immersion.Further, we find that EEG signals are also related to immersion.Thus, we attempt to employ video features, user features, and EEG signals to predict user immersion and answer RQ3.

Experiment Settings
We have three settings with different features as input: (1)useritem features (ui): We use user-related features and item-related features as the baseline.Specifically, the video and audio features are mentioned in Section 4.  1: Performance of immersion prediction by using useritem pair features (ui), EEG signals, and ui+EEG.5-fold validation is conducted and a two-sided t-test is performed.* / * * indicate p-value <0.05/<0.01,compared with the ui Group.Bold indicates the best results.↑ means the higher the better, and ↓ is for the lower the better.
SVs.We extracted Differential Entropy (DE) at every electrode over five passbands.( 3) ui+EEG: we combine the baseline features and EEG features to jointly predict the immersion.
As the first step to reveal whether the immersion status is predictable, we use three classical machine learning algorithms (SVM, XGBoost, and MLP), which are able to naturally handle mixed types of features and have good predictive power.More complex and powerful approaches can be studied and proposed in the future.
We aim to predict user immersion in the lab study dataset.We take immersion prediction as both regression and classification tasks.For the regression task, we try to predict immersion ratings (1)(2)(3)(4)(5).For the classification task, 1-3 is seen as low immersion, and vice versa.Five times 5-fold validation is conducted.We carefully tuned all hyper-parameters of each model and setting, and recorded the best results 1 .The performance is estimated by Mean Squared Error (MSE) for regression task and accuracy for classification task.

Immersion Prediction Results
The results are shown in Table 1.It shows that only using user and item information, such as video characteristics and user age, has already led to encouraging results with more than 69% accuracy and less than 1.1 MSE score.The addition of EEG to ui features provides significant improvements in predicting immersion across all the above models.Sometimes, using only EEG as input produces better results than using only ui features.This suggests that EEG contains richer information than user and item features, which can be effectively leveraged by the models.The best result is achieved by combining ui and EEG as input with an accuracy of 70.12% and MSE score of 1.05 on XGBoost.
Employing a combination of ui and EEG features as input, XG-Boost showed the best performance, followed by SVM and MLP.In such a small sample size, using a neural network with a straightforward concatenation of EEG features does not yield optimal results.It requires a more refined model design to achieve better results.The models in this study are relatively simple, yet they have yielded encouraging results, highlighting the significant role of EEG in understanding and predicting immersion.

Immersion as a Satisfaction Indicator
Promising results for immersion prediction shows that immersion is predictable.As user satisfaction is the core objective of a recommender system, we try to further explore whether the predicted immersion can indicate user satisfaction.
Generally speaking, behavior metrics are used to present user satisfaction, such as clicking and liking [23].Some research also took moods as an important indicator of satisfaction in recommender systems [21].Thus, we analyze and compare the correlation between the aforementioned metrics and satisfaction.
We conduct analysis at both session-level and video-level satisfaction in the lab study, as we collected user satisfaction ratings for each video and session (details in Section 3.4).True immersion and mood were reported by the participants for each SV, while liking and view ratio are video-level behaviors.Predicted immersion scores are obtained by aggregating the regression results from the five-fold cross-validation with XGBoost.We compute the Pearson correlation between all the metrics and satisfaction, respectively.As shown in Table 2, both at the video and session levels, immersion labels show the highest correlation with satisfaction, followed by predicted immersion, mood labels, and behaviors.As a future implicit label obtained through EEG, the predicted immersion demonstrates a closer relationship with user satisfaction compared to existing metrics such as liking behavior and view ratio.In fact, it surpasses even the correlation with user-reported explicit mood labels.Therefore, we further propose that immersion can serve as a measure of user satisfaction and be utilized as an evaluation metric for an SV recommender system.
Answer to RQ3: Immersion is predictable by adding EEG features in terms of regression and classification tasks.The predicted immersion is closer related to user satisfaction than behavior metrics and mood labels, which illustrates the potential of leveraging immersion as a satisfaction indicator in a recommender system.

DISCUSSION AND LIMITATIONS
The proceeding sections addressed three research questions around immersion in online SV interaction.Here, we discuss the practical implications and insights that arise from these findings.(1) Immersion is a favorable psychological experience that is associated with positive and high-arousal moods in SV browsing.Therefore, a controllable immersion status is desired for a better user experience.
(2) A longer continuous browsing time does not correspond to a better immersion experience.This inspires us to a healthier goal of recommender systems: the recommender systems should not only focus on user retention but also optimize for users' better immersion.(3) EEG is helpful in understanding immersion, particularly in the prefrontal and parietal lobes of the gamma band.Integrating EEG significantly enhances the predictive accuracy of immersion.(4) The correlation between user satisfaction and predicted immersion is higher than that and behavioral feedback and mood labels, highlighting the potential of using immersion as a strong indicator of user satisfaction in recommender systems.
Despite our best efforts, this work has several limitations.(1) The participant scale is limited.We recruited 30 participants from the university for both field and lab studies because of the significant costs involved in EEG collection.Similar participant scales have been utilized in previous EEG studies [20,48].(2) While our aim is to study immersion in real-world online SV interaction, there are inherent disparities between users wearing EEG caps in a lab environment and their daily user experiences.To approach realworld conditions, we used a real recommendation video streaming, and users were allowed to swipe away videos at any time, just as they would normally do in their browsing activities.(3) The method used to predict immersion in this study is relatively straightforward, but more complex models may yield improved outcomes.Our focus is on demonstrating the effectiveness of a simple machine learning approach: directly concatenating EEG feature signals, to improve immersion prediction.This study represents an initial step and reveals that even with a simple approach, the predicted level of immersion shows a strong correlation with user satisfaction.

CONCLUSIONS AND FUTURE WORKS
To our knowledge, this is the first work to study immersion in short video (SV) interaction with EEG in real online scenarios.Threestep experiments (the survey, field study, and lab study) have been carried out.Factors related to immersion have been investigated, including video and audio factors, personalization, user mood, and interaction behaviors.Immersion has clear connections with the user's EEG signals in the prefrontal and parietal lobes of the gamma band.We further conducted experiments to predict immersion by integrating video and user features with EEG signals.Encouraging results demonstrate the potential of EEG signals for the inner digging of people in online browsing scenarios.The predicted immersion has a high correlation with satisfaction and shows its potential as satisfaction evaluation in recommender system.This is a pivotal initial step in researching responsible recommender systems.Moving forward, we aim to combine immersion with the SV recommendation system with the goal of enhancing users' positive experiences, thus fostering the development of more responsible recommender systems.

Figure 1 :
Figure 1: The overall procedure of the three-step experiment: online survey, user field study, and lab study with EEG signals.

Figure 2 :
Figure 2: The distribution of session-level score in low, medium, and high-level immersion in each of the four dimensions of immersion in survey, respectively.

Figure 3 :
Figure 3: The percentage differences of (a) audio features and (b) video features between low and high immersion groups.* / * * indicate p-value<0.1/0.05 by independent t-test.

Figure 4 :
Figure 4: The distribution of self-reported immersion scores in four settings at session level in the lab study.

Figure 5 :
Figure 5: The distribution of immersion self-reported scores for every mood score of two dimensions (valence and arousal) in the lab study.

Figure 6 :
Figure 6: The relation between immersion scores and user behaviors.(a) and (b) show session-level immersion scores of different continuous browsing duration groups in survey and field study.(c) and (d) are liking percentages and view ratios of different video-level immersion scores in the lab study.

Figure 8 :
Figure 8: The Pearson correlations of the immersion with DE in frequency bands of beta (12-30 Hz) and gamma (30-45 Hz) grouped by valence (negative or positive mood, a-d) and arousal (low or high arousal mood, e-h) in lab study.

Figure 9 :
Figure 9: The correlations of the immersion with DE in the frequency bands of alpha (8-12 Hz), beta (12-30 Hz), and gamma (30-45 Hz) grouped by personalized and nonpersonalized settings (a-d) and the behavior of liking and viewing without liking (e-h) in lab study.
2. Gender, age, and usage years of SVs as user-related features.(2) EEG: we employ EEG signals to predict immersion.The EEG signals are collected when the user is browsing * Table

Table 2 :
Pearson Correlation of self-reported satisfaction score at video and session level with behaviors, mood, and immersion.