The Relationship between Users’ Behavior and Their Flow Experience in Gamified Systems

Modeling users' experience in gameful systems is one of the main contemporary challenges in the field of human-computer interaction. One of the most desired and complex experiences to be identified is the flow experience (i.e., challenge-skill balance, action-awareness merging, clear goals, unambiguous feedback, concentration, sense of control, loss of self-consciousness, transformation of time, and autotelic experience). Facing this challenge, we conducted a quantitative study (N = 313) based on structural equation modeling, aiming to model and predict the users' flow experience through their behavior (represented by performance-related, interaction with gamification, as well as the time they take in different actions) in the system. The main results indicate that i) gamification (i.e., doing well in points, badges, and leader-board) was positively related to users' experience of good challenge-skill balance, ii) whereas it was negatively related to users' concentration. Thirdly iii) user performance was positively related to users' concentration. However, overall, the results indicate that while associations between user behavior and flow experience could be established, there remains future work to be done to fully explain user flow experience while using a system. Our study contributes to the fields of human-computer interaction, gamification, and educational technologies, especially through insights related to modeling and predicting flow experiences in gameful systems through behavior data.


INTRODUCTION
In recent years, several types of educational systems (e.g., educational games [27,57,109], Massive Open Online Course (MOOCS) [11,68,129], and Intelligent Tutoring Systems (ITS) [20,30,78]) have emerged intending to improve the quality of online education [58].The emergence in the use of educational technologies was further strengthened in 2020 due to the Covid-2019 pandemic [1,72,113].These systems have been increasingly used by instructors and students from di erent countries and have attracted the attention of researchers, that are investing in the design of this type of system [7,73,126].The main idea is to improve the users' experience in this type [15,19,71].
A widely used technique to improve students' experience in educational systems is gami cation (i.e., "the process in which services, activities, and systems are trans gured to promote similar motivational bene ts as found in games" [44,64]) [6,8,85].Gami cation aims to improve the students' experience (e.g., engagement [75], motivation [124], and ow [74]) in the educational systems [5,64,81], and depending on the application design, it can increase users' time in the systems and improve the students' learning experience [64].
At the same time, one of the most discussed experiences in studies on gami cation in education is the ow experience [36,107,117], which is an experience of deep engagement which people can achieve during a certain activity [24] and is highly linked to the learning experience (i.e., in general, who achieves a high ow experience can also achieve a high learning experience) [22,96,108].Thus, the ow experience is seen as a key experience for general users to obtain a desired behavior when using a certain type of system [50,61,122].
However, one of the main challenges related to the studies on ow experience in gami ed educational systems is the modeling and measurement of this experience [23,28,69,82,91,106]. Often, the challenge is due to this analysis occurs through invasive approaches (e.g., electroencephalograms (EEG) [10] and eye trackers [115]) or that cannot be applied massively (e.g., interviews [112] and questionnaires [111]) [82], which makes it di cult to analyze the ow experience in gami ed educational systems and consequently draw attention to the need for approaches that move towards the automatic identi cation of that experience [80,91,106].
To face this challenge, in this study (N = 313) we performed a data-driven analysis aiming to model and predict the students' ow experience (i.e., challenge-skill balance, merging of action and awareness, clear goals, feedback, concentration, control, loss of self-consciousness and autotelic experience) in a gameful system (i.e., a gami ed educational system) based on the users' behavior data logs (e.g., the average response time on correct answers, the proportion of correct steps/activities, and points) during the system usage.Then, we aimed to answer the following research questions: How to model users' ow experience through their behavior data in gameful educational systems?and How to predict users' ow experience through their behavior data in gameful educational systems?We used a robust data analysis technique (i.e., partial least squares structural equation modeling (PLS-SEM)) to analyze the data.
The main results indicate that i) gami cation (i.e., points, based, and leader-board position) is positively related to users' challenge-skill balance, ii) gami cation is negatively related to users' concentration, and iii) performance is positively related to users' concentration.However, despite the results regarding ow experience modeling through behavior data being signi cant, the internal prediction power is low, not allowing the generalization of the results.With this study, we contribute to di erent elds as such human-computer interaction, gami cation, and educational technologies, providing insights on how to model users' ow experiences in gameful systems through behavior data.Based on the results, we also present a series of research directions that can be taken into account in future research.

BACKGROUND
In this section, we present the study background (i.e., Gami cation in education, Flow Theory and ow experience measurement) and a comparison between the main related works.

From gameful design to gamified education
Gameful refers to a state or quality of being like a game or having characteristics of a game [31,67].It can be used to describe something that is playful, enjoyable, and engaging, and that incorporates elements of game design into its structure [35,48].The goal of gameful design is to create an experience that is enjoyable, motivating, and meaningful, while still achieving the intended outcomes [48,67].
From the general concept of "gameful", rise "gami cation" ("the process in which services, activities, and systems are trans gured to promote similar motivational bene ts as found in games" [44,64]), used in recent years in several areas of knowledge (e.g., marketing [121], health [128], and education [114]).However, the area with the most applications in education [6,63,64].In education, gami cation is studied in di erent aspects (e.g., design, application, and evaluation) [100].Especially, over the past few years, because of advances in gami ed education, several gami ed educational systems have been implemented and used in educational settings [101].
The growing use of gami cation in education is because gami cation (when well-applied), can improve di erent students' experiences (e.g., engagement, motivation, and ow) [74,75,124].These experiences can lead to a better learning experience in educational environments [26,93,125].However, the growing number of users of gami ed educational systems has created the challenge of evaluating the users' experiences in the gami ed educational systems [64], which is usually done through questionnaires [87].
Thus, one of the current challenges of gami cation in education (i.e., gami ed educational systems) is to use student interaction data logs in the systems to model and predict students' experience during system usage [64,80,87].One of the possibilities in this context is to analyze the users' own interaction with the gami cation elements (e.g., ranking, trophies, and points) and the relationship with the users' experience in the system.

Flow Theory and empirical flow experience
The Flow Theory was proposed by Csikszentmihalyi [24] and represents "an optimal experience that people have as a motivating factor in their daily activities" [37] or in a general summary, an experience of deep engagement that a person can achieve in a given activity [21,24,25].Over time, several studies related to Flow Theory have been conducted in di erent areas such as sports [53], video games [14], and education [108].The ow experience can be reached in di erent types of activities and is composed of nine di erent dimensions [22,25,52]: (1) challenge-skill balance (CSB); (2) unambiguous feedback (F); (3) clear goals (G); (4) action-awareness merging (MMA); (5) total concentration on the task at hand (C); (6) sense of control (CTRL); (7) loss of self-consciousness (LSC); (8) transformation of time (T); and (9) autotelic experience (A).These ow experience dimensions can be further organized into "antecedents of ow" and " ow itself".
There are three antecedents of the ow, and they are considered factors that need to be provided by a certain activity so that the other ow experience dimensions be reached.There are: • Challenge-skill balance: represent when experiencing ow, a dynamic balance exists between challenges and skills.Challenges and skills, however, can be changed in any activity, making ow an accessible experience across all domains of functioning [52].• Unambiguous feedback: represent when receiving feedback associated with a ow state, the individual does not need to stop and re ect on how things are progressing [52].• Clear goals indicate that goals are necessary part of achieving something worthwhile in any endeavor and the focus that goals provide to actions also means that they are an integral component of the ow experience [52,54].
The ow itself is composed of six dimensions, which need to be reached jointly by an individual, thus forming the ow experience.
• Action-awareness merging: the unity of consciousness apparent in this ow dimension and illustrates the idea of growth in complexity that results from ow experiences [52].
• Total concentration on the task at hand de nes one of the clearest indications of being in ow, that is, totally focused in the present on a speci c task being performed [52].• Sense of control is like ow itself, the sense of control often lasts only a short period of time and this relates back to keeping at the cutting edge of the challenge-skill balance in a situation [52].• Loss of self-consciousness is liberating to be free of the "voice within our head" that questions whether we are living up to self-or other-imposed standards [52].• Transformation of time: experiencing time transformation is one of the liberating dimensions of ow (to feel free from the time dependence under which we live most of our lives) [52].• Autotelic experience: is generally after completing a task, upon re ection, that the autotelic aspect of ow is realized and provides high motivation towards further involvement [52].
The ow experience is represented by the interconnection of these nine dimensions (antecedents and ow itself) [21,52,54].That is, for a person to achieve the ow experience it is necessary that the activity provides all the antecedents of the ow and at the same time, the person feels the other six dimensions of the ow itself, at the same time [23,25].In education, the ow experience has also been extensively studied [47,60,92] and recent studies show that the ow experience is directly associated with the student's learning experience in di erent settings [36].This means that if students can achieve a high level of ow experience in a given educational environment, they are more likely to have a high learning experience [93? ].Despite the various studies involving Flow Theory and Education, one of the main challenges still remaining related to Flow Theory in educational systems is the ow experience measurement [45,54,55,69,80].

Flow experience measurement
Flow experience measurement has been done in di erent ways over the years [83,97].Initially, the experience was measured using a system that requested that a certain person presses a button whenever they feel an experience of deep engagement (i.e., the ow experience) [21].This kind of measurement generates some biases, is expensive, and could not be conducted massively [80].Therefore, to improve this situation, other methods have been proposed over the past four decades.
Initially, one of the rst methods proposed was interviews and focus groups with people [25].However, these methods are also costly and also do not allow for massive applications [80].Thus, questionnaires/scales emerged as a way to measure the ow experience [55].This technique has expanded and is still the most used method to measure the ow experience [83,97].Also, over the past few years, several questionnaires have been proposed and validated in di erent domains (e.g., physical activity [54], sports [55], and gami cation [45]).
Despite the advances, this method still presents problems such as the di culty of application.Thus, in the last decade other methods have also been proposed, such as the use of EEG or eye trackers [3,28,118,123].Still, all of these methods fall into three problems, either they are costly or they are invasive or they cannot be applied massively [80].
Therefore, more promising methods are the proposal of approaches for the analysis of the ow experience based on the data of logs produced by users in educational systems [69,88,91,106].In general, these approaches relate to the users' ow experience to the data logs that are produced by those users.However, this approach is still incipient with few studies (see subsection 2.4), requiring further studies with high sample sizes and data analysis using di erent techniques.

Related work
To identify the main related works and provide a deep eld understanding, we analyzed the results of three systematic literature reviews conducted by Perttula et al. [97], Oliveira et al. [82] and Oliveira et al. [87] who described the state of the art on Flow Theory and educational technologies (including the most used methods for identifying the students' ow experience in educational systems).Then we also performed an exploratory review aiming to nd new studies.The results show that in recent years, few studies have sought to propose approaches to automatically identify the users' ow experience in educational systems [82], highlighting the importance of approaches that, for example, relate the students' ow experience with their data logs during the system's usage.[118] used a questionnaire associated with an EEG analysis aiming to investigate the e ects of students' challenge-skill balance on their ow experience, as well the e ects of students' ow experience on their learning.Their results showed that the students' ow experience depends on a challenge-skill balance of learning materials [118].In this study, Wang and Hsu [118] also investigated the possibility of using an inexpensive nonmedical EEG device to research the association between ow experience and challenge-skill balance in the system.

Measuring flow experience. Wang and Hsu
Akcan [3] used a ow scale to measure the players' ow experience in advergames (considering the nine ow experience dimensions).At the same time, the author also analyzed the participants' eye movements using eye-tracking data.The study does not present an analysis of the correlation between the players' ow experience level, the places of the games where the players looked the most or their eye movements.However, the study opens up the possibility of correlating the ow experience of players with eye movements.
Wu et al. [123] used an EEG to measure the EEG-detected real-time ow states of di erent students this study revealed a whole-part association between students' momentary and overall re ective ow experiences.The study results indicate that it is possible to correlate the students' ow experience with their behavioral pattern (detected by the EGG), thus opening space for other types of analyses [123].

2.4.2
Measuring flow experience through behavior data logs.Lee et al. [69] conducted a study to identify whether the users are in a ow experience, where was experimented with a sample of 55 participants.They used step regression (i.e., a data mining technique) to analyze the student's data logs and compare the students' data logs (i.e., students' behavior) with their ow experience.In their study, Lee et al. [69] implemented one of the nine ow experience dimension (i.e., challenge-skill balance).
Kock [28], proposed an approach to automate the ow state identi cation using an EEG with 20 participants during the use of an educational game aiming to associate seven di erent brain dimensions with the participants' ow experience.To access the participants' ow experience, the author used the Abbreviated Flow Questionnaire (AFQ).Their results show an association between the participants' ow experience and some speci c brain dimensions [28].
Challco et al. [13] conducted a study proposing a framework to integrate the learner's growth process with the ow state to lead and maintain the students in ow during the educational system usage.Challco et al. [13] also operationalizes the ow only as of the perception of the challenge-skill balance dimension, without considering the other ow experience dimensions.
Oliveira et al. [91] proposed a theory-driven conceptual model, associating students' interaction data logs with each of the ow experience dimensions.They evaluated the proposal with three di erent experts.Despite representing an advancement towards automatic ow experience identication in educational systems, the model has not been evaluated with real data and the authors recommend its validation with real data produced in educational systems [91].
Oliveira et al. [88] conducted a qualitative study (with six participants) through the think-aloud protocol to associate user data logs with the user ow experience within an educational system.The study identi ed a relation between four types of data logs and seven of the nine ow experience dimensions [88].Despite these promising results, the results were obtained through a qualitative study and need to be con rmed through quantitative studies based on data from more users.
Semerci and Goularas [106] conducted a study to capture the interaction of students in an e-learning environment automatically and use these data for evaluating their ow state in a course.With a sample composed of 87 students from two di erent departments of di erent faculties [106].Analyzing data through heatmaps and deep neural networks, they found a signi cant correlation between the survey results ( ow experience) and students' performance and activity.These results highlight the need to carry out similar studies, including new types of data logs and individually analyzing all Flow Theory dimensions.
In a sequence of more recent studies, Oliveira et al. [86,89,90] conducted studies based on behavior data logs aiming to model the ow experience in games and gami ed systems.The results showed that some behavioral data can be associated with some ow experience dimensions.However, both studies were conducted with limited samples, and according to the authors themselves, despite representing an advance in the literature, the results need to be further investigated, especially with larger samples in di erent systems [86,89,90].
Muramatsu et al. [77] utilizing behavioral data produced by users, evaluated the applicability of employing one single type of behavior data (i.e., mouse click frequency) as an exclusive metric to model and to predict students' ow experience.In two data-driven studies (N1= 25 | N2= 101), they identi ed that the mouse click frequency on its own is not able to predict the ow experience [77].

Summary.
The studies aiming to model or predict students' (although one of the reviews was conducted free of the domain and exploratory review, only studies in the general eld of education were identi ed) ow experience through data logs focus on analyzing a single ow experience dimension or they present more exploratory analytical approaches, which do not allow obtaining more con rmatory insights related to modeling and predicting the ow experience through behavior data.As far we know, our study is the rst study aiming to model and predict students' ow experience through users' behavior data logs in gami ed educational systems, using a validated theoretical model (considering all the nine ow experience dimensions [22,24,54]).

RESEARCH DESIGN
Our study is characterized as a data-driven study [32], analyzing data of participants using a gami ed educational system.

Research questions and hypothesis
Our study aims to model and predict users' ow experience in gami ed educational systems through their behavior data logs produced during the system usage.Thus, we aim to answer the following research questions (RQs): How to model users' ow experience through their behavior data in gameful educational systems?and How to predict users' ow experience through their behavior data in gameful educational systems?
Over the years, studies have shown that the ow experience is an experience highly related to the student's learning experience in educational settings [23,36,96,108].At the same time, di erent recent studies proposed that there is a relationship between di erent types of user experience and the data logs produced by these users in the systems [49,51,130].Additionally, studies have proposed that there is a direct relationship between the users' ow experience in educational systems and the data logs that are produced by these students in the system [28,69,80,88,106].Thus, in this study, we hypothesized that is possible to model and predict the students' ow experience through their behavior data in a gami ed educational system.

Materials/instruments and method
To carry out the study, we used the system "Learning in Flow" [4], a gami ed educational prototype composed of a series of educational activities related to Logical Reasoning.The activities available in the system are 20 logical reasoning activities.The activities are of di erent levels, from the easy to the di cult (in sequential order).The questions were initially de ned and analyzed by Albuquerque et al., [4,77,104].No minimum time was de ned for using the system to make the activity as free as possible, maximizing the chances of participants having a less forced and more spontaneous experience when using the system.The activity was considered nished only after all 20 tasks were answered.
The system used was chosen because it was created speci cally for carrying out this type of study.At the same time, the system has the most used gami cation elements in gami ed educational systems (i.e., points, badges, ranking, levels, progress bars, and avatars [29,46,64]).The system was also already used and analyzed in other recent studies [4].Figure 1 presents the system home page (where participants can select an avatar to represent itself in the system) and activity page (where participants can do the activities).To collect the students' data logs during system usage, a module was implemented to collect the users' data logs in the system.The model collected nine di erent data logs according to the theoretical model proposed by Oliveira et al. [91] for the automatic students' ow experience in educational systems: • Active time in the system (ActTS): Total time that a user spends in each session in the system (from the login until the logout).• Used time to nish a step/activity (Art): Di erent from the rst information, this information represents the total time that a user uses to nish a speci c activity/task or a step in the system (in our study we divide this data into two types: a) average response time in correct answers (ArtCA) and b) in incorrect answers (ArtIA)).
• Proportion of correct steps/activities (ProCS): Average of user's correct answers in a group of activities/tasks on the system.• Proportion of help requests (ProHR): Average of a user's help requests for completing an activity/task in the system1 .• Proportion of correct steps/activities after feedback (ProCSF): Average times a user has correctly answered a step/activity after a feedback message stating the step/activity result.• Average response time after a feedback: Average time a user spends to answer a question/task after receiving feedback from the system (in our study, we divide this data into two types: a) average response time after positive feedback (ArtPF) and b) after negative feedback (ArtNF)).• Total unique session views (TV): Number of times that a user tries to do the same activity/task (e.g., number of times the user sees the same tutorial).• Number of mouse clicks out of buttons (NMC): Average time a user clicks on the screen (neutral) that does not bring any action back to the user (e.g., clicks on a text area).In addition to the data proposed in the study of [91], we also take the total of consecutive hits (TCH) and the average of consecutive hits (ACH).We decided to include this new data to have more data related to the performance of the participants in the system.
To identify the students' ow experience, we used the short ow state scale (short FSS) proposed by Jackson and Eklund [54], which consists of nine questions representing the nine original ow experience dimensions proposed by Csikszentmihalyi [24].The questionnaire was chosen because it is the most used questionnaire in studies related to Flow Theory and technologies in education [82].The questionnaire was also previously validated by Hamari and Koivisto [45] for the gami cation domain.The instrument was applied through a ve-point Likert scale [70] as recommended in the ow "Flow State Manual" developed by Jackson et al. [52].Additionally, to mitigate threats to validity related to the participants' attention during the study, following the recommendation of Kung et al. [65] and the example of other studies in our eld [84,90,103], we added an "attention check statement" (i.e., if you are lling out the form carefully, answer 4).Students who missed the attention check question were removed from the nal data analysis.In the appendix ??, we present the short FSS used in our study.
Regarding the method, we organized the study into four steps.Initially, in the rst step, as Connelly [18] recommended, a pilot study was conducted with 10 participants.The pilot study analyzed whether the system was working correctly and whether the amount paid to participants was su cient.
After conducting the pilot study, in the second step, we started the recruitment phase of the study participants.We used two di erent platforms to recruit participants, Amazon Mechanical Turk (MTurk) 2 , a crowdsourcing marketplace service highly used and recommended for experiments with humans [95].To collect the user data using MTurk, we followed the 10 good practices (recommendations) proposed by Aguinis et al. [2], especially, we de ned clear reward rules for participants, allowing participants to judge whether the reward amount was appropriate for the study they had participated in.Each participant received 25 cents for their participation.
Proli c platform 3 , is another crowd-sourcing marketplace service highly used and recommended for experiments with humans [94].In this step, each participant received 0.63 £ for their participation.On this platform, the cost is calculated automatically according to the time of the experiment.The choice to use these two platforms was due to the objective of recruiting participants from The Relationship between Users' Behavior and Their Flow Experience in Gamified Systems 386:9 di erent countries, with di erent cultures, thus having a heterogeneous group of participants in demographic aspects.To have participants with di erent pro les, no criteria for participation were previously stipulated.The data was collected in January 2020.Then the data was organized and processed in the correct format for data analysis.Figure 2 present the study organization.

Pilot studies
Participants recruitment and data collect Data processing Data analysis

MTurk and Prolific
PLS-SEM Fig. 2. Study organization: figure with four balloons ("Pilot studies", "Participant recruitment and data collection", "Data processing", and "Data analysis") in a sequence, describing the study organization.

Participants and data analysis
We initially received 330 responses.17 were removed because answered wrong the attention check question.Our nal participants were 313 (174 self-declared as male, 137 self-declared as female, and two preferred not to inform), from 32 di erent countries with an average age of 23 years old (Table 1 present our sample size details).To calculate our sample size, we used the method of "a-prior analysis" based on the anticipated e ect size, and the desired probability and statistical power levels [17,120].We calculated the necessary sample size using the Online Calculator provided by Soper [110].In our study, given the nature of the study, we calculated the sample size considering the anticipated e ect size: 0.3 (medium), Desired statistical power level: 0.8 (by convention), and probability level: 0.05 (by convention) [17,120].The sample size of our study is also considered adequate for the measurements considering di erent types of metrics: according to Bentler and Chou [9], there must be a minimum ratio of ve respondents per construct in a model (in our study, we have nine constructs (i.e., the nine ow experience dimensions)).At the same time, Kyriazos et.al., [66] de nes that at least 100 participants are required for the minimum sample size in this kind of study.To analyze the data, we used partial least squares structural equation modeling (PLS-SEM) [42], which is a useful technique for evaluating complex theoretical relationships between multiple variables, especially when conducting social science [42,43,116].Two fundamental SEM methods have been proposed and used over time, which are covariance-based structural equation modeling (CB-SEM) and partial least squares structural equation modeling (PLS-SEM) [40].We decided to use PLS-SEM because it is especially useful when the user's structural model objective is to predict and explain the target outcomes as obtained by the in-sample and out-of-sample metrics [56], thus, allowing model relationships between variables ( ), analyzing the statistical signi cance of these relationships (p), and identifying the internal predictive power of the estimated model (R 2 ).The technique has been widely used in recent studies in the eld of gami cation [98,103,119] as it allows performing this type of analysis with a high level of reliability, even in smaller samples (e.g., N < 1000) [66].
In our study, observable variables related to user behavior (i.e., users' data logs) were modeled as latent variables (i.e., variables modeled from other observable variables that can be directly observed or measured [34]) based on the type of data collected.Thus, the data were transformed into three (i.e., gami cation, performance, and time) latent variables, representing the users' behavior: • Usage time: i) active time in the system, ii) used time to nish a step/activity, iii) average response time after negative feedback, and iv) average response time after positive feedback.• Users' performance: i) average of consecutive hits and ii) total of consecutive hits.
• Gami cation: i) total of points, ii) total of badges and iii) ranking position.

RESULTS
Initially, before performing the main analysis of the study, we analyzed the simple correlation between the variables related to user behavior and the ow experience dimensions (including ow in general).As these are non-linear relations, we chose to use Kendall's correlation [16].The Kendall rank coe cient is usually used to test statistical hypotheses to establish whether two variables are statistically dependent [59].Table 2 present Kendall's correlation between all variables measured in the study.The results show that, although some relationships are signi cant, the correlations are weak.Next, we start modeling and internally predicting the relationships between user behavior data logs and their ow experience during the system usage.Thus, we rst calculated the composite reliability (CR) and average variance extracted (AVE) of the latent variables used in the model.CR measure is used to measure the internal consistency of a group of items used to measure a latent variable in SEM, thus, indicating the extent to which the items measure the same underlying construct consistently (ranging from 0 to 1, with higher values indicating greater internal consistency of the indicators, where a value of 0.700 or above is generally considered acceptable) [39,41,99].AVE measures the convergent validity in SEM indicating the amount of variance that a latent construct shares with its indicators relative to the amount of variance due to measurement error, calculated as the average of the squared correlations between a latent construct and its indicators, divided by the sum of the variances of the indicators.AVE value of 0.5 or higher is typically considered acceptable, indicating that the indicators adequately measure the latent construct [39,41,62].In our study, both CR and AVE analysis only serves to observe the relationship between the observable variables that compose each latent variable, instead of being used to analyze the quality of the model, as in others that use the same technique.Table 3 present the composite reliability results.Next, we calculated the discriminant validity (DV) [12], a technique to measure whether the concepts that are not supposed to be related are actually unrelated, thus referring to the ability of a construct to be distinguished from other constructs in the same model.Ideally, the correlation coe cients should be low or non-signi cant between constructs that are theoretically unrelated [39,41,62].In our study, this calculation also has a more observational character, considering that we are not seeking to propose a model, but rather to analyze the relationships between variables.Table 4 present the discriminant validity.Finally, we conducted analyses to model and observe the internal predictive power (i.e., the ability of a model to predict the observed variables within the model) between the users' behavior data logs and their ow experience when using the system.Especially, the internal predictive power was measured based on 2 values, which measure the proportion of variance in the observed variables that the latent variables in the mode can explain. 2values range from 0 to 1, with higher values indicating better predictive power [39,41,105].Table 5 presents the path coe cients, and Table 6 presents the internal predictive power of the model.Then, we performed the same analyses, now considering no longer the ow experience dimensions individually, but, the ow experience in general.We chose to carry out the analyzes in di erent models inspired by the literature that treats the ow experience with an association in all dimensions, at the same time that studies usually analyze each of the dimensions separately.8 present the DV, Table 9 present the path coe cients, and Table 10 present the internal predictive power.

Discussion
Identifying the ow experience is a challenge that has been dealt with for decades.Several alternatives have been proposed over time to analyze the ow experience.Most methods require participants to answer scales or use body-worn equipment.Facing this challenge, we explored the possibility of using user behavior data logs from a gami ed system to model and predict users' ow experience.The results indicate that behavior data can be used to model some dimensions of the ow experience.However, the results also indicate that the predictive power is low, and it is not possible to predict users' ow experience based on their behavior log data.
Initially, gami cation (i.e., number of points and badges and raking position) was positively associated with one of the ow experience antecedents.i.e., challenge-skill balance ( = 0.366 | p= 0.011).This result suggests that users with high performance in gami cation tend to have a greater sense of challenge-skill balance during the tasks.This dimension is considered by many studies to be the main antecedent of the ow experience [33,38,76].Thus, this result suggests that gami cation may have to represent a factor that directly a ects an antecedent of the participants' ow experience.On the other hand, gami cation was negatively associated with participants' concentration ( = 0.279 | p= 0.047).One of the possible reasons for this result may be the fact that the participants with better performance in relation to gami cation were unable to maintain proper attention in the system, dividing their attention between activities and gami cation and, consequently, losing concentration.The study reported in this article was conducted in a short period.Thus, this result corroborates the results of other recent studies that draw attention to the fact that gami cation can have a more immediate e ect on the users' perception [102].
This result is also supported by the result that indicates that participants with better performance had a higher concentration ( = 0.308 | p = 0.018).This result happens due to the fact that participants with better performance (regardless of the gami cation), managed to keep their attention only on the activities and, consequently, maintain a higher level of concentration.
In our study, time did not signi cantly a ect any of the dimensions of the ow experience.If, on the one hand, this result contradicts the results of other studies [91], at the same time it may have a direct relationship with the nature of the study.That is because it is a quasi-experimental study, where all participants need to perform the same activities (including the same amount of activities), and time cannot directly a ect any of the dimensions of the ow experience.
Regarding the internal prediction level ( 2 ), the results indicate that even where the modeling results were signi cant, the internal prediction levels remained low.These results can occur due to two di erent factors.The rst is because the observable variables, which were the latent variables, have values that are not strongly correlated, which reduces the levels of prediction.Another possibility is directly related to the sample size, indicating that although the results of the relationships (i.e., modeling) were high, a larger sample is needed to attest to signi cant levels of prediction.
Regarding modeling the users' ow experience itself, no signi cant results were identi ed.In general, our results indicate that there is no relationship between user behavior data in the system and their ow experience.This result con rms most of the results found in the literature.In general, the literature avoids identifying the ow experience itself, seeking to analyze only the antecedents of the ow experience [69], or analyzing only the ow experience dimensions individually [91,106].This decision by most studies is generally based on the inherent di culty of analyzing the ow experience itself, given its level of depth.
In a general comparison, our results are in the same direction as the recent literature, indicating that there is no direct relationship between user behavior data in gami ed systems and their ow experience when using the system.However, our results indicate that behavioral data hold promise for modeling and predicting some of the dimensions of the ow experience.
Our results o er some theoretical and practical contributions.The rst concerns the role of gami cation elements (i.e., points, badges, and leaderboards) in the user ow experience.We identi ed that these elements in uence the total concentration on the task at hand, which suggests that gami cation can be an e ective strategy to increase user engagement in various activities.Furthermore, we identi ed that gami cation also a ects the balance between perceived challenge and user skills.This nding highlights the importance of carefully considering the design of gami cation elements to promote a ow experience.
Another contribution of our research is related to the impact of users' performance on the balance between challenge and skills.We identi ed that users' performance a ects perceptions of this balance, suggesting a bidirectional relationship between the ow experience and personal achievement.This nding may have important implications for promoting intrinsic motivation and developing users' skills.The results indicate that interventions aimed at improving performance can potentially in uence users' ow experience, reinforcing the importance of promoting an environment conducive to personal growth/performance.
Finally, our research contributes to the theoretical understanding of the ow experience by exploring the use of user behavior records in a gami ed system.By employing behavioral data as indicators of the ow experience, we provide valuable insights for researchers and practitioners interested in understanding and facilitating ow in di erent contexts.This innovative approach expands the possibilities of analyzing the ow experience, allowing future research to explore further the relationships between user behavior, systems design, and subjective experience, deepening our understanding of this critical theoretical construct.

Threats to validity and limitations
Our study was conducted with human beings, which leads to the generation of possible threats due to limitations inherent to the character of the study.Next, we will describe how each of these limitations or threats has been dealt with/mitigated.Initially, the ow experience is considered by some researchers to be a subjective experience and may depend on each individual [23,24,52].This can make identifying the ow experience complex.To mitigate this threat to validity, in our study, we used only previously validated instruments to analyze the users' ow experience (i.e., short FSS proposed by [54] and analyzed psychometrically by [45] for the gami cation domain).
Similarly, the relationship between log data and user ow experience is not yet established in the literature.So to mitigate threats related to which type of data to collect, we collected data according to the theoretical model proposed by Oliveira et al. [91].A di culty also inherent in this type of analysis is collecting data from people from di erent cultures.To ensure the greatest power of generality of the results in terms of participants' culture, we chose to perform the study on international platforms (i.e., MTurk and Proli c) for data collection, so that we receive data from participants from di erent countries and consequently di erent background pro les (increasing the results generability power).
Studies of this type require a large sample to increase the generalization power of the results.Thus, our sample may not be su cient to ensure the generality of the results.To mitigate this limitation, we chose to use a modern technique, capable of producing reliable results even with smaller samples (PLS-SEM).Using this technique, our sample is su cient to perform multilevel modeling and SEM [127], as well as bootstrap estimation [79].Likewise, the study was performed on a single system, so the results may not be generalizable to other systems.
The study was conducted online, without real-time observation of the participant's actions.Thus, external factors may have a ected the participants' experience.To mitigate this limitation, we conducted a study with pre-de ned tasks for all participants, as well as, we chose not to remove possible outliers, thus avoiding losing data that present a plausible behavior of a user when using the system.Although the MTurk and Proli c platforms are widely used in studies in the area, their limitations are recognized.Thus, many studies in the area tend to have limitations inherent to the use of these tools.Faced with the impediment of using only data from voluntary participants, we decided to merge data from voluntary participants with paid participants from the two di erent platforms, thus making the data more generalizable, also following good practices in the use of these platforms.

Recommendations for future studies
As the use of gami ed systems continues to increase, understanding how users engage with these systems and the factors that contribute to their experience is becoming increasingly important.Behavior data logs provide a wealth of information that can be used to model and predict users' ow experience in these systems.In our study, we advanced the literature, however, there are also numerous challenges associated with the collection and analysis of behavior data, as well as opportunities for future research in this area.Next, we will explore some of these challenges and opportunities based on the results of our study.
• The eternal problem of sample size: in our study, as in the vast majority of studies in the area, the sample size is su cient to conduct adequate analyses.However, it is not su cient to provide generalized results.This is due to a series of factors, ranging from the time dedicated to research projects to nancial reasons (lack of resources to carry out some projects).For the results of this type of analysis to occur, as has been recurrently recommended in studies in the area, it is important to conduct studies with much larger samples (N > 1000 participants).Therefore, we recommend that the community make an e ort (e.g., joining and mixing resources from di erent research groups) to carry out studies with larger samples that allow greater power of generalization.• Beyond current behavioral data: In all studies conducted to date, behavioral data boils down to data coming from user interactions on desktop/laptop computers.However, it is increasingly noticeable migration of users to other types of devices, ranging from smartphones to metaverse devices.On these devices, the behavior data can be completely di erent, needing to be analyzed individually.Given this, we recommend that future studies invest in using behavioral data from other devices.• Multiple associations: In our study, we associated users' behavior data logs with their ow experience collected through a scale.However, ow experience can be collected in other ways (e.g., eye-tracking, and EEG).EEG and eye-tracking data provide objective measures of users' cognitive and physiological responses to gami ed systems, which can complement and enrich the insights obtained from self-reported data.Thus, to ensure even more accurate analysis, we recommend that future studies associate users' behavior data logs with multiple sources of data, including data collected through electroencephalography (EEG) and eye tracking.• Measuring the ow over time: One of the main challenges in modeling and predicting users' ow experience in gami ed systems is that the experience can vary over time.In our study (and in the other studies in this eld) the studies and proposed approaches only measure the ow experience at a particular point in time.Thus, we recommend exploring approaches that can automatically and continuously monitor users' ow experience over time.

CONCLUDING REMARKS
In this study, we explored the possibility of using user behavior data logs in a gami ed system to model and predict the participants' ow experience.The data was analyzed to identify relationships between the users' behavior data logs and their ow experience.The main results of the study indicate that gami cation is positively related to users' challenge-skill balance, which is a key factor in promoting the ow experience.In addition, the results showed that performance has a positive e ect on users' concentration, which is another important aspect of the ow experience.In future studies, we aim to replicate this study with a larger sample size to validate the ndings and explore the relationship between user behavior data and the ow experience more comprehensively.Additionally, in future studies, we aim to include di erent types of data analysis methods, such as machine learning and predictive modeling, which could enable more accurate predictions of the ow experience.

NOTES
Previous studies of this project have been published: Oliveira et al. [82] conducted a systematic literature review about Flow Theory and Educational Technologies; Oliveira [80] presented the project overview; Oliveira et al. [91] proposed a theoretical model relating students' data logs and their ow experience in educational systems; Oliveira et al. [88] conducted a qualitative study analyzing students' data logs and their ow experience in educational systems; and Oliveira et al. [86,90] conducted data-driven studies modeling and predicting (respectively) students' ow experience based on their data logs in a gami ed educational system; Oliveira et al. [89] investigated the relationship between students' ow experience and their behavior data in a gami ed educational system.
Fig.1.Examples of the system used in the study: two figures showing the system used in the study.The figure on the le side presents a ranking (with leaderboards), avatar options that can be chosen by users, and a space where trophies will appear.The figure on the right side shows a ranking (with leaderboards), a logical reasoning quiz (one of the questions answered by the study participants), and the space where the trophies will appear.

Table 1 .
Sample details 1 Key: N: number of participants per country listed in the "countries" column.

Table 6 .
Internal predictive power

Table 7
present the CR, Table

Table 8 .
Discriminant validity for the overall flow experience (complete bootstrapping, sample=5000)