Visualizing the Spatio-Temporal Evolution of Gameplay using Storyline Visualization: A Study with League of Legends

Players increasingly adopt a data-driven approach to review and improve their gaming skills. In the wake of this, spatio-temporal visualizations gained popularity but remain challenging to design. Storyline visualizations are unique in the way they integrate time and location information into a single view to show how entity relationships develop over time. We adopt the storyline visualization technique to summarize gameplay for the purpose of post-play review. We demonstrate the method by applying it to League of Legends matches and evaluated it with 39 players of the game in a task-based online study using the triad framework for spatio-temporal queries by Peuquet. Results indicate that players responded positively to the approach and could, by and large, solve tasks well but that time-based tasks proved most challenging and least efficient to solve. Based on our findings, we reflect on possibilities for enhancing the design of storyline visualizations for game-related data analysis.


INTRODUCTION
Game developers are increasingly tracking data about the in-game activities of their players.While ultimately players may indirectly bene t from this, the direct recipients of the gathered data are and have mainly been the developers themselves, who use it for marketing and games user research activities.However, in-game data has also become an important source of information for players.This demand, although not exclusively, can be attributed largely to the rise of competitive online gaming and esports.Such games are inherently skill-based, requiring the players to understand the intricacies of gameplay to achieve mastery (see also [27]).Steep learning curves and being required to practice the game while already competing against others [46] complicates skill acquisition further.As such players have started to seek out assistance by relying on their own or others' collected behavioral data to re ect upon their performance or compare themselves to and learn from others (cf.[25,65]).Moreover, as elaborated by Kleinman et al. [27], many players in the absence of a coach or another expert in the game need to learn the game on their own.Data-driven tools support this need for self-directed training in-situ or, at the user's own pace, post-play.They enable players to identify mistakes or skill gaps, allow them to develop new strategies, and provide decision support through objective data.
Developers have recognized this interest and started exposing data about in-game behavior and performance to players, for example, through public Application Programming Interfaces (APIs).This, in turn, has further facilitated player-facing community and developer e orts which is re ected in various websites and tools that o er support for in-game data analysis [20,25].However, such data can be complex, especially when it goes beyond aggregated performance statistics and instead describes in-game behavior in space and across time.Indeed, an interview study with esports players [27] revealed that they nd it challenging to track what is happening in-game, including having di culties with spatial awareness, that is understanding, for instance, where players are situated in relation to other players, units, items, objectives, or the geographical game space in general.Likewise, Wallner et al. [65] found that spatial information such as positions and movements are important for making gameplay decisions across three popular competitive game genres.Visualizations and visual analytics tools are well-suited for such a spatio-temporal reconstruction of gameplay.
Kleinman et al. [26] have shown that spatio-temporal game data visualizations a ord di erent activities ranging from the low-level examination of data points, to understanding positioning and movement, to sense-making and validation through hypothesis con rmation.However, existing training tools for players often include visualizations that are time-invariant and focus on spatial patterns only (e.g., [64]) or focus on the timing of actions while neglecting their spatial organization (e.g., [9]).Consequently, they often include multiple views to capture both dimensions.Others, such as Afonso et al. [3] or MacCormick and Zaman [34] rely on animations or playbacks which have their advantages but also place higher demands on working memory (cf.[10] for a discussion).In addition, watching video recordings or replays of matches can be a very time-consuming process [67] and they do not provide an overview of the spatio-temporal patterns at once.However, spatio-temporal visualizations to help communicate the data in a single view are challenging to design because of the diverse nature of the data and the number of variables involved [6].Such visualizations thus often make use of aggregation or geospatial abstractions to reduce the amount of data and its complexity (see, e.g., [1]) and, in turn, make it more accessible and understandable.
We aim to contribute to this line of research and propose using storyline visualizations to create post-play match summaries.Storyline visualizations are unique in their approach as they integrate both time and location information into a single view to depict how entity relationships develop over time (cf.[52]).To achieve this, they are showing interactions of actors via the proximity of lines and rely on a pre-determined set of locations.Storyline visualizations therefore make use of abstraction by translating raw geospatial trajectories into a sequence of locations that have been visited.In other words, they represent movement as so-called semantic trajectories (cf.[69]).They originated from a xkcd comic [36] and have since then been used for showing the plots of movies but also for visualizing topic competition on social media [68] or group collaboration in project management settings [33].Like many have argued that players construct their own narrative while playing (e.g., [16,30]), watching a reply or summary visualization can also be viewed as reconstructing the story of a match.
In this paper, we apply the visualization to matches of League of Legends [47] (LoL), a popular competitive multiplayer online battle arena (MOBA) game.Since storyline visualizations show the interaction between actors, they are well-suited for team-based games to visualize team dynamics and coordination.Together with the limited set of actors involved in a MOBA match, this makes MOBAs a suitable use case for this type of visualization.The locations used for the story are derived from subdividing the LoL map into areas that re ect the map's common parts (lanes, jungle, etc.).In that sense, a storyline visualization abstracts from geospatial positions to locations that have semantic meaning for the story.To evaluate the ability of players to infer information from such a visualization we conducted a task-based online study with 39 LoL players.In particular, we investigated a) how well and e ciently participants can solve di erent spatio-temporal questions based on the triad framework by Peuquet [42], b) if the number of locations a ects participants' ability to solve these questions, and c) how players perceive this kind of visualization.Our results indicate that players appreciated the approach, that correctness was on a relatively high level but that time tasks were the most di cult and least e cient to solve, and that the number of locations had no in uence.
Taken together, this paper contributes a rst adaptation and evaluation of the interpretability and reception of storyline visualizations in a game analytics context.In doing so, we also contribute to the understanding of player needs with respect to visual gameplay review.In addition, based on our ndings we discuss opportunities for further improving the design of storyline visualizations and re ect on implications for future research.

RELATED WORK
Visualizations of in-game data have found di erent target audiences ranging from developers to assist them in the analysis and communication of collected player data (e.g., [17,23]), to players to enable them to review their gameplay (e.g., [3,64]), and more recently to spectators to allow them to better follow live matches (e.g., [15,28]).While we also envision potential applications of our approach for spectators, this paper is focused on its use by players to review matches post-play.As such, we will focus our review on player-centric visualizations and the application of storyline visualizations in di erent domains.

Player-centric Visualization
Hazzard [19] distinguishes between two primary types of data visualization in games that are directed towards players: 1) status visualizations that show information about the state of the game, for instance, as part of its user interface to allow players to make informed decisions and 2) training visualizations which goal is to help players improve their gameplay.Concerning the latter, Kleinman and El-Nasr [25] point to an abundance of data-driven systems, particularly within the context of esports, that are targeting players to help them gain experience and understand how games ought to be played.Wallner et al. [65] found that many of the reasons for viewing post-play visualizations or replays are connected to players' desire to understand and improve their gaming skills.At the same time, Wallner and Kriglstein [64] -evaluating the usefulness of di erent visualizations for the retrospective analysis of team battles using data from World of Tanks [66] concluded that the demands and expectations of players with respect to training visualizations are very heterogeneous.As such, investigating the usefulness of di erent visualization approaches for summarizing gameplay data becomes important to be able to satisfy diverse player needs.However, while data-driven tools are manifold within player communities (e.g., statistics websites [39,50] or tools such as Blitz [12], Shadow [51], or Skybox [54]) they remain -despite increased e ortsstill less explored from an academic perspective.Among these works, Kuan et al. [29] presented a visualization system that summarizes battles of Starcraft II [13] by providing di erent granular views on the data.Wallner [62] proposed an algorithm that summarizes troop movements in the form of battle maps, which are often used by historians to visually summarize military battles.Afonso and colleagues have reported extensively (e.g., [2,3]) on the development of the VisuaLeague tool to aid players in analyzing LoL matches, particularly trajectories and events.These approaches use maps to display the spatial characteristics of the data.However, play also unfolds temporally which creates further challenges for visually summarizing in-game activities.Afonso et al. [3], for instance, rely on animation while Kuan et al. [29] provide additional visualizations (e.g., timeline) and lter options to de ne time ranges.MOBA Coach [21] integrates di erent views of which one shows spatial data at a speci c point in time which can be adjusted using a slider.In contrast, we are aiming for an integrated view of time and location information within a single visualization and without the need to have to rely on animation or multiple views to show temporal evolution.

Storyline Visualizations
The origins of storyline visualizations can be traced back (cf.[32,59]) to Munroe's xkcd comics [36] illustrating the plots of di erent popular movies by showing the relationships among actors over time (see Figure 1 for an example).After the publication of the illustrations, several information visualization researchers have credited Munroe's work as inspiration for developing novel visualizations that show spatio-temporal relationships.For instance, Reda et al. [45] used a similar visual metaphor to depict evolving social structures while Kim et al. [24] used them to represent temporal relationships within genealogical data.While these are highly domain-speci c, Ogawa and Ma [38] -albeit focusing on visualizing the interactions between developers of software projects -were probably the rst to propose a general algorithm for automatically producing storyline visualizations.However, early attempts did not reach the aesthetic appeal of hand-crafted illustrations.
Nevertheless, designing storyline visualizations is also considered a di cult task as the narrative constraints given by the story and layout constraints must be balanced (cf.[58]).As such Tanahashi and Ma [57] have presented a set of guidelines to create legible and aesthetically pleasing layouts.Subsequently, more and more advanced approaches have been proposed such as the StoryFlow system by Liu et al. [32].Tang et al. [59] compared how hand-drawn storylines di er from those produced algorithmically.Based on these insights, they developed the iStoryline system that improves upon previous attempts and allows for optional user interventions to tweak the created layouts.Tang et al. [58] later developed the system further by integrating a reinforcement learning model that assists users in creating storylines visualizations more e ciently.For our use case, however, we did not want players to have to manually adjust the generated layout as this would require watching the replay which we want to avoid in the rst place.As such we rely on the above-mentioned and publicly available iStoryline framework which we modi ed for our purposes (see Section 4).In particular, Arendt and Pirrung [8] have found that the traditional encoding of interactions via proximity along the -axis can be misleading as the same location can appear at di erent -coordinates.Lastly, it should be mentioned that most existing algorithms rely on complete data to produce the storyline visualization.This is appropriate for post-play match summarization as applied to in this paper since all the data to reproduce a match is available at this point.However, there also exist techniques (e.g., [56]) that construct storylines iteratively from streaming data which can be more useful in the context of live spectatorship (we discuss this further in Section 7).

USE CASE: LEAGUE OF LEGENDS
LoL [47] is one of the most popular MOBA games.Matches are played between two teams (commonly referred to as the RED and BLUE team) of ve players each in their attempt to rst destroy the opponent's base.Each player controls a single champion that can be selected from a large rooster, all having unique abilities.Like other MOBA games, LoL focuses on cooperative play and character development [70].If a player dies, the champion respawns at the base after a short wait time.In this paper, we focus on the map Summoner's Rift which can be considered the game's main map.It is divided into three major lanes (top, middle, and bottom), the jungle, and the two base areas.Turrets (sometimes also called towers) scattered across the map automatically attack units of the enemy team on sight and provide vision to the own team.They hence form valuable assets in the game.
We have chosen LoL as use case for several reasons.First, player interactions and team coordination are essential parts of gameplay that can be well communicated through a storyline visualization.Second, LoL has a small number of players involved which keeps the number of actors and thus the number of lines in the visualization reasonable, in turn, helping to reduce visual clutter.Third, it has a large player base that is also interested in match analysis as evidenced by various existing tools (e.g., [49,51,54]).Lastly, LoL o ers access to the required game data for reconstructing a match through the public Riot API [48].
In particular, we gathered match data using the RiotWatcher [49] wrapper to access the Riot API.While the match details contain a variety of metrics and information we extracted 1) champion movement, 2) the position and time of events (kills and building destructions), and 3) general champion information including champion name and ID (see also Section 4).With respect to champion movement, it should be noted that the coordinates of players are only o ered in regular 1-minute intervals by the API which limits the temporal resolution of the trajectories to some degree (see Section 7.1).As mentioned above, a storyline visualization works with semantic locations instead of raw coordinates.Consequently, the champions' / -coordinates obtained through the API were mapped to semantic locations by comparing the coordinates with the extent of the de ned areas (e.g., lanes, see Figure 2 for an example).The time/location pairs then served as input to the visualization.

STORYLINE VISUALIZATION
The visualization proposed in this paper draws upon the storyline visualization technique that shows the temporal relationships of individuals to depict, for instance, the plots of novels and movies [59].This is done by encoding time on the -axis and location information along the -axis.Each actor is represented by a line that diverges and converges based on whether they are in the same location at a given point in time or not.
Our implementation is based upon the iStoryline visualization framework [59] which is publicly available on GitHub2 .In addition, it does not require manual adjustment of the layout of the generated visualization compared to other storyline visualization algorithms such as [32].Figure 2 provides an overview of the di erent elements of our implementation.In general, the visualization shows three main types of information which ranked highly in a previous study on information needs of players [65] with respect to reviewing MOBA matches: 1) the champions' movements of the own and enemy team between areas of the map, 2) the position and time of events -particularly kills and building destructions, and 3) information about ghts involving multiple champions.
The movement of the ten champions is shown via color-coded lines .Warm and cold colors are assigned to the members of the RED and BLUE team respectively.In addition, dashed lines are used for the BLUE team to distinguish the champions of both teams even more.In addition, to adapt the framework for summarizing LoL matches, we made several adjustments to the framework: • Storyline visualizations usually do not require the -coordinate to have a consistent meaning over time [8], i.e. the same location can appear at di erent -coordinates at di erent points in time.While this helps to reduce edge crossings and wiggles to produce more aesthetically pleasing layouts, Arendt and Pirrung [8] found it to also be misleading.As such, we kept the position of the locations xed along the -axis.The locations are shaded alternately in grey and white to visually indicate the boundaries between them .• Champion paths in the visualization are based on the regular 1-minute coordinates as retrieved from the API plus the coordinates stored as part of the considered events.However, the API does not explicitly record respawns after a death but only the player's position at the next 1-minute mark or the next occurring event which means the respawn at the base would not show up in the storyline.This may be misleading for which reason we added an additional data point shortly after each death which forces the lines to go back to the respective base of the player (see Figure 3c).• To contextualize the movement patterns and provide additional in-game data we included events, speci cally kills and building destructions.These are depicted using color-coded icons.Champion kills are indicated by skull symbols colored according to the victim's color while building destructions are illustrated by turrets colored according to the killer's color .The icons are placed at the line of the respective player at the respective time.As the lines are drawn using Bezier curves to smoothly interpolate them between the provided time/location pairs, this may, however, result in icons being drawn in the wrong location area.To circumvent this we added additional data points with the same location information shortly before and after the event to create a 'plateau' in the interpolated line at the time of the event to ensure that the icon is placed within the right location (see Figure 3a and 3b).• To help players recognize ghts involving multiple players, we clustered two or more temporally and spatially consecutive kills into group ghts.This was done using the ST-DBSCAN clustering algorithm [11] which can cluster objects based on their spatial and temporal values.We used an implementation 3 which requires the speci cation of three parameters: the minimum number of points _ (kill locations in our case) required to form a cluster, the maximum spatial distance 1 and the maximum di erence in time 2 for a point to be assigned to a cluster.Clusters were calculated on the / coordinates received from the API (thus in the two-dimensional game space to consider the actual geographic arrangement and not on the coordinates of the events in the abstracted storyline visualization).These were normalized and for oating-point precision issues with the ST-DBSCAN implementation multiplied by 1000.In our case we empirically set _ = 3, 1 = 90 (thus approximately covering the distance of 1/11 of the map), and 2 = 104 seconds.These group ghts are visually represented by the convex hull surrounding the icons of the involved kills .
• Lastly, we added tooltips that provide additional information on demand.Hovering over a line displays a tooltip that shows the name and icon of the champion this line belongs to.Similarly, the kill and turret tooltips allow the user to check the speci cs of an event (killer and victim name and icon).The grey and white location areas show a tooltip with the name of the location and a mini-map highlighting this location.However, as these areas cover the whole background their tooltips are only triggered when keeping the left mouse button pressed.• The minimap on the left shows the subdivision used to divide the map into semantic locations.
The left sidebar also lists the icons of the champions surrounded by a colored border to indicate which champion is represented by which color .

STUDY
To evaluate the visualization, we rst conducted an in-person pilot study.The aim of this study was to uncover major issues with our implementation.Based on the results, we adjusted the visualization before conducting the main study.The main study employed an online survey and followed a task-based methodology to assess (RQ-1) how correctly (1a) and e ciently (1b) players can answer tasks with respect to when, where, and what queries, (RQ-2) if correctness and e ciency are in uenced by the number of locations considered in the visualization, and (RQ-3) how the visualization is perceived by players.Towards this end, we employed a between-subject design using two di erent numbers of locations and two di erent LoL match replays.

Pilot Study
Before the main study, we ran a small pre-test with six participants in a quiet o ce room with the goal to assess if the visual elements of the visualization were understandable and if users could extract simple match information from the visualization.We recruited six local computer science students (i.e. two females and four males) between 23 to 32 years of age ( = 27.4,= 4.0).All had played LoL for at least two years.Before the study, participants received information about the purpose, procedure, and the voluntary and anonymous nature of the study and had to provide their written consent.Once consent was provided, participants were asked to familiarise themselves with the tool by interacting with it.Participants then had to describe what the di erent elements of the visualization represented.Additionally, they were asked about the clarity of information and about speci c design elements such as labels, the representation of locations, colors used, etc. Next, participants were required to answer several simple information tasks related to the match such as Please tell us approximately at which time the second battle between more than two players took place?or Who was killed?.Lastly, participants were asked general questions about what they liked and disliked.
Participants were able to interpret the visual elements and to answer nearly all questions correctly, although in a few cases, participants were unsure if their answers were correct.We, however, found two main issues, which we addressed before the main study.First, all participants complained that some color pairs used to di erentiate the players were quite hard to distinguish.Second, participants mentioned that the order of the locations on the -axis should be reversed.This was because we initially placed the BLUE base (and its adjacent locations) at the top of the axis which is, however, in the bottom left corner on the map.Consequently, we reversed the ordering, thus making this information in the tool more spatially and semantically congruent with the LoL map.Another minor request was for the sidebar to remain visible at all times.After discussion, we choose instead to make player and kill information directly accessible via tooltips.In addition, we xed minor bugs detected during the pilot study.

5.
2.1 Procedure.The study was conducted via an online survey created with LimeSurvey [31].We opted for an online setting to be able to reach a larger player base.The survey was advertised on LoL-related subreddits (e.g., r/leagueo egends, r/supportlol), Discord channels (e.g., Riot Games Third Party Developer Community), social media, and among student gaming clubs.The survey itself was divided into three parts: 1) study information, consent, and demographics, 2) tasks, and 3) closing questions.
Study Information, Consent, and Demographics: The welcome page included a short study description followed by a consent form.Participants were required to be 18 years or older and the survey was fully anonymous and voluntary.Once participants provided their consent, the survey gathered basic demographics (age and gender with the latter following the recommendation by Spiel et al. [55]).Next, the survey inquired about their experience with LoL, particularly the number of years the game was played, the rank of the participant, and the preferred role (Top, Jungle, Mid, Bot, or Support).Participants were also asked to indicate how often they use LoL-related analytic tools and websites (never, sometimes, frequently, very frequently).
Tasks: The tasks formed the main part of the study.Before the rst task, participants were introduced to the visualization and received a short summary of the di erent visual elements and their meaning.Then, they were invited to explore the visualization to familiarize themselves with it.For this part, a di erent replay was used than for the actual tasks to avoid any confounding in uence.Once the participant was ready, they could proceed to the rst task by pressing the 'Next' button.Subsequently, the participant was randomly assigned to one of the four stimuli (see Section 5.2.3) with which they had to conduct all the tasks.The respective visualization opened in a separate browser tab which participants were asked to keep open for the duration of the study.In case a participant accidentally closed the visualization, a link to it was also provided on the survey page of each task.The tasks (see Section 5.2.2) were presented in random order.After each task, the participant was asked to rate their perceived mental e ort on the single item Paas Mental e ort scale [40], a 9-point scale ranging from 1 = very, very low mental e ort to 9 = very, very high mental e ort.We have chosen this scale mainly due to being an established scale and its simplicity to keep the duration of the survey reasonable and to avoid fatigue.Afterward, the next task followed.
Closing Questions: Once all tasks were completed, the participants were asked to rate the visualization on the six criteria described by Wallner et al. [63].In brief, these criteria are: clarity-is the data clearly interpretable; readability-are the visual elements easily legible and distinguishable; informativeness-does the visualization provide new and interesting insights; aesthetic appeal-is it visually appealing; accurateness-is the displayed data accurate enough; and usefulness-is the visualization useful for the tasks.These were measured on a 5-point scale anchored by 1 = poor and 5 = excellent.Additionally, we inquired if they found the number of locations into which the map was split appropriate (too coarse, about right, too detailed) as it may in uence the spatial accurateness as well as the complexity of the visualization.While the survey was mainly quantitative, two free-text questions to gather further insight into participants' ratings asked 1) about what they liked/disliked about the visualization and 2) for suggestions for improvement.Finally, the participants could take part in a ra e for a chance to win one of ve 20 EUR Amazon vouchers.

Tasks.
To evaluate the proposed method, we applied a task-based approach which is commonly used in information visualization research (cf.[18]) to, for instance, compare -as in our case -correctness and response time across di erent tasks.Given the type of displayed information, we followed an established task taxonomy for spatio-temporal data to cover typical tasks related to such kind of data.Several typologies have been proposed to date, with the triad framework by Peuquet [42] being frequently used.The framework distinguishes between three components of such data, namely location (where), time (when), and objects (what).Given two components, the framework subsequently distinguishes between three types of basic questions: when + where → what, when + what → where, and where + what → when.Later, Andrienko et al. [7] -focusing on time -simpli ed the framework by distinguishing only between two types of tasks which either ask for time or give time to ask for other types of information (hence not covering all tasks of our scenario).Amini et al. [5] proposed a more ne-grained typology with components either No. Task when + where → what 1 Which champion(s) of the RED team have been at at minutes? 2 Which champions were killed in the group ght taking place from 1 to 2 minutes in ? 3 Which champion destroyed the tower at minutes in ? 4 Which champion spent most time at between 1 and 2 minutes?
when + what → where 5 In which area did champion die the rst time after the minute mark?6 In which area(s) did the group ght from 1 to 2 minutes take place?7 In which area did champion spend the most time between 1 and 2 minutes?8 Which area(s) did champion visit between minute 1 and minute 2 ?
where + what → when 9 When was the rst tower destroyed by champion in area ? 10 From when to when did group ght(s) take place in ?11 At which time(s) was champion killed in area ?12 From when to when were champions 1 and 2 together in area for the rst time?being given or not and being able to refer to single or multiple instances (e.g., single time point vs. multiple time points).Despite o ering more exibility we, nevertheless, followed Peuquet's [42] framework, as Amini et al.'s [5] typology was considered too extensive for keeping the number of tasks reasonable for our online study.Kleinman et al. [26] presented a rst step towards a taxonomy speci cally for interactions with spatio-temporal game data visualizations that includes more high-level activities (data interaction, sense-making, and validation).Viewed within their framework, our tasks can be considered to fall roughly within the data interaction category which covers, amongst others, studying positions and movement.
Following the triad framework, we have created 12 tasks, with four tasks each inquiring about when, where, and what.These tasks are summarized in Table 1 and were presented in randomized order during the survey.The speci cs about locations, times, and objects were adjusted for each of the four visualizations to match the displayed data.The complete list of tasks with answers can be found in the supplementary material.

Stimuli.
For the tasks of the study we created four visualizations based on two di erent replays.The two replays (called R-1 and R-2) were selected in a way that the resulting visualizations di ered in density and distribution of the data over time (to avoid potential bias from relying on one speci c match only).R-1 was a rather balanced game with two equally strong teams and where much of the activity took place in the central areas of the map and thus at the vertical center of the visualization.R-2 was from a match in which the RED team gained superiority relatively early, pushing the ghting into enemy territory, and putting BLUE in a defensive position.As such the lower half of the visualization is more dense than the upper part.For each replay, we created two versions based on subdividing the LoL map into di erent number of locations (referred to as small and large in the following), yielding four di erent stimuli (see Figure 4) of which each participant received one randomly.
• The visualizations with the small number of locations are based on subdividing the map into the three lanes (top, bottom, middle), the two base areas, as well as into the jungle.However, as the jungle covers a large area of the map, we have decided to split the jungle into two areas -the top part (the section of the jungle between the top and middle lane) and the bottom part (the section between the middle and bottom lane).Otherwise, the transitions between the jungle and the other locations would have been rather ambiguous in the visualization.In total, the map was therefore divided into seven locations.• The visualizations with the large number of locations divide the map into 17 locations by splitting the three lanes and two jungle areas into three sections each.The two base areas are the same as in the version with the small number of locations.

Participants.
We received 76 complete responses of which eight were removed due to completing the survey far too quickly (< 15 min; threshold chosen based on our own pre-testing of the survey length, number of questions in the survey, and inspection of response times) and 28 due to suspected duplicate entries based on identical (or near identical) generic responses to the qualitative questions.One further response was removed as the survey was taken on a mobile phone for which the visualization was not optimized.This resulted in a nal sample of 39 participants with approximately equal responses for each visualization (R-1, small: 9, R-1, large: 12, R-2, small: 9, R-2, large: 9).These respondents took about 38.5 minutes on average (STD = 15.6) to complete the survey.Participants (35 males, 2 females, 1 non-binary, and 1 preferred not to disclose) were aged between 18 and 32 years (M = 22.85, STD = 3.43).
In terms of LoL experience, 32 participants had an in-game rank (5 Bronze and below; 13 Silver or Gold; 14 Platinum or above) while seven had no rank.Typically, ranked players are more experienced than unranked ones.In terms of role preference, 18 participants preferred playing a support role and seven each preferred the top and bottom roles.Five players preferred to play mid lane and two to play a jungler.Twelve participants had less than or equal to 3 years of experience with LoL, 18 had between 4 to 9 years, and nine participants had 10 to 12 years of experience.Moreover, seven participants considered themselves to use LoL analytic tools or websites very frequently, 14 made frequent use of them, 13 participants reported to sometimes use them, and only ve respondents indicated that they never used one.

Analysis.
Correctness for tasks was recorded as a binary variable with 1 = correct and 0 = incorrect.For when questions a ±15 sec tolerance window was applied as the visualization only included 1-minute tick marks.In addition, we calculated the e ciency of the visualization for the di erent tasks.Visualization e ciency as de ned by Huang et al. [22] -based on previous work in instructional e ciency [61] -is given by the achieved accuracy (in our case correct or incorrect) and the required cognitive cost in terms of time and mental e ort , mathematically: In terms of interpretation, an e ciency score of = 0 thus refers to instances when accuracy and required e ort balance each other out.Please note that the equation uses -scores of the variables to make them comparable as they are measured on di erent units.In our case, -scores have been calculated across all tasks.In addition, given the online setup, we applied -score outlier removal on the recorded timings for the individual tasks to remove entries for which participants took extremely long and may thus have switched to something else while doing the survey.Given the small sample, we used | | > 2 (two standard deviations away from the mean) as threshold, yielding six data points to be removed, corresponding to timings of > 9.9 minutes.Given the non-normal distribution of most measures, statistical comparison was performed using non-parametric tests.For Friedman tests, Kendall's is reported as an indicator of e ect size as recommended by Tomczak and Tomczak [60].E ect sizes for Wilcoxon signed-rank and Mann-Whitney U tests were calculated, following Pallant [41], as = / √ , with being the total number of observations for the former and the total sample size for the latter test.For chi-square tests, the phi coe cient is provided as e ect size measure (cf.[41]).Responses to the rating criteria were treated as ordinal and thus analyzed using non-parametric tests as well.
To understand how the visualization could be improved with respect to the six criteria we analyzed the responses to the two open-ended questions by rst using a deductive coding process.For that purpose, the answers to the open text questions (see Section 5.2) were initially segmented into text units that had coherent meaning.Two coders independently coded this set according to the six criteria used the rate the visualization (cf.Section 5.2) and then assigned a valence to each text unit to better capture participants' sentiment (e.g., negative).Afterward, the two coders met to discuss discrepancies and the remaining issues were resolved in discussion with a third reviewer.Within each criterion, the statements were labeled using inductive coding conducted by two of the authors.Again, discrepancies were resolved through consensus discussion with a third coder serving as a tie-breaker.After each step, the coding process was reviewed to ensure congruence and consistency.Furthermore, labels were reviewed for adequateness and meaningfulness.

RESULTS
In the following, study results concerning correctness and e ciency are reported rst, followed by the ratings with respect to the six visualization criteria.

Correctness
Taking a look at the di erences across the three question types rst, Figure 5a shows the average number of successfully solved tasks for the three di erent types of questions.What questions were solved most successfully, followed by where questions, and lastly when questions.A Friedman test found a signi cant di erence in the number of correctly solved tasks based on type ( 2 (2) = 15.2, = .001,= .195).Pairwise post-hoc comparisons using Wilcoxon signed-rank tests and Bonferroni correction (resulting in a signi cance level of .017)showed signi cant di erences between when and what questions ( = −4.021,< .001,= .46)as well as when and where questions ( = −2.575,= .01,= .29)but not between where and what questions ( = −2.081,= .037,= .24).Accordingly, when questions were more di cult to solve than where and what questions.
To ascertain if there are di erences in how correctly the di erent types of tasks can be answered based on the number of locations (small, large) we conducted Mann-Whitney U tests with a Bonferroni adjusted -level of .017.These showed no signi cant di erences for any of the task types with what: = 151.5,= .26,= .18,where: = 188.0,= .976,= .004,and when: = 182.5,= .845,= .03. Figure 6 (top) shows the average correctness values for the 12 tasks for the visualizations using the small and large number of locations as well as for both together.Due to the large number of tasks, we refrain from pair-wise comparisons between the individual tasks to avoid in ation of Type I errors.Descriptive data displayed in Figure 6 (top) shows that -across both conditionsfour tasks (T2, T3, T5, T9) were solved with an above 80% average correctness rate, three with above 60% (T4, T7, T12) and two tasks (T1, T8) slightly below it.Three tasks (T6, T10, T11) were less successfully solved with average correctness at around or lower than 50%.Next, we assessed if there are di erences in successfully solving the individual tasks based on whether the task was completed with a small or large number of locations.Given the dichotomous nature of the outcome variable (correct or incorrect) we used chi-square tests, indicating no signi cant di erences for any of the tasks at a Bonferroni corrected -level of .0042.

E iciency
Figure 5 shows the required time (b) to solve the di erent tasks, the self-reported mental e ort (c), and the derived e ciency scores based on Equation 1 (d), again split by the number of locations and for both together.Across the three task types, more time was needed to answer when questions than where questions which, in turn, required more time than what questions.Mental e ort follows a similar trend, being lowest for what type of questions and increasing for where and when questions.Consequently, e ciency was lowest for when questions, while what questions were most e cient to solve.For where questions correctness and required e ort were mostly balanced out.Since e ciency takes into account timings and mental e ort we will restrict the statistical analysis to e ciency in the following.With respect to the in uence of the number of locations on achieved e ciency, Mann-Whitney U tests with Bonferroni correction showed no signi cant di erences in e ciency for any of the task types with what: = 167.0,= .535,= .01,where: = 153.0,= .310,= .16,and when: = 138.0,= .151,= .23.Looking at the individual tasks, Figure 6 shows the timings, mental e ort ratings, and derived e ciency scores for each task.Across both conditions (small and large), Task 1-4 (what tasks) have positive e ciency values which are higher than those for Task 5-8 (where tasks) which, in turn, are higher than those for Task 9-12 (when tasks).When tasks mostly had negative e ciency with especially Task 10 (asking for multiple time spans) being least e cient to solve.
Finally, pair-wise comparisons using Mann-Whitney U tests and a corrected -level of .0042indicated no signi cant di erences for any of the tasks, whether they were performed on the visualizations with a small or large number of locations.

Ratings
Ratings with respect to the six criteria are depicted in Figure 7. Mann-Whitney U tests revealed no signi cant di erences in the ratings with respect to the number of locations at a Bonferroni corrected -level of .0083.
With respect to opportunities for improving the visualization in regard to these six criteria, we will focus on aspects mentioned by at least 10% (=4) of the respondents.Focusing rst on readability -which scored together with aesthetic appeal lowest among the six criteria -respondents mainly lamented about overlapping lines which made it more di cult to follow them (7 participants).Four participants found certain colors to be di cult to distinguish.As there are ve players on each team, nding an appropriate color scheme for the two teams is challenging.Participants did, however, not provide comments related to the aesthetic appeal.Clarity, which received average scores, appears to have been mainly diminished by participants nding it di cult to interpret the lines (6 participants), for example, due to intersections and the vertical lines indicating the respawns.Secondly, the ordering of locations on the -axis appeared not always logical for them, impacting interpretability as well (4 participants).Participants did also not express much concerns about the accurateness which is in line with the good ratings.The main issue was that the display of events sometimes appears inaccurate (e.g., icons being slightly in adjacent areas due to space restrictions) as agged by four participants.Regarding informativeness, which scored average to good, 11 participants mainly wished for more data and statistics (e.g., gold per minute, vision score) and eight participants for more accurate time information (e.g., exact timestamps when hovering over a line or event icon).Lastly, with respect to usefulness, four participants thought that more information needs to be included to support reasoning; otherwise, not much critique was voiced.The latter two are also the criteria on which participants focused most in their positive feedback which, for instance, lauded the usefulness for match review (5 participants), the visualization of champion paths (4 participants), and the visual representation of team ghts (5 participants).
Concerning the number of locations, 72.2% of respondents exposed to a visualization with a small number of locations found the number to be about right and the remaining too coarse.Results for the visualizations with a large number of locations are similar with 66.6% nding the division to be about right.23.8%, however, found it already to be too detailed.The remaining two participants still thought it was too coarse.

DISCUSSION
The storyline visualization approach presented in this paper reconstructs spatial and temporal aspects of gameplay, particularly players' movements and di erent events, within a single view.While such single-view spatio-temporal visualizations can contribute to summarizing play more concisely they are -as discussed earlier (cf.Section 2) -still rarely used for gameplay analysis so far.Storylines are unique in their approach as they abstract locations and encode the temporal proximity of characters which can especially be useful in team-based games.
Correctness and E ciency (RQ-1a & RQ-1b): Our evaluation showed that what questions were solved best, followed by where and when questions.However, in terms of correctness the di erence between what and where questions was not statistically signi cant, only between these two types and when questions.In terms of e ciency, however, di erences between all three task types were statistically signi cant, pointing to higher time and cognitive load, with the latter seemingly having the larger impact (cf. Figure 5b and 5c).E ciency for when tasks was the lowest in relation to the other two task groups (see Figure 5d).In this case, especially the required time to solve the tasks was increased (Figure 5b).If we take a closer look at the individual when tasks we can observe that particularly tasks T10, T11, and T12 were harder to solve.These tasks required multiple answers and can thus be considered more di cult (as a question only counted as correct if all entered times were within the ±15 seconds tolerance).If we take a look at the what and where tasks which also required multiple answers (T1, T2, T6, T8) we can see that these (except T2) constitute the least well-solved tasks in their respective category as well.However, we suspect another factor to be at play here which relates to the visual saliency of the information as the time axis only showed vertical lines crossing through all areas every 3 minutes and only small ticks otherwise (cf. Figure 2, ).This was originally done to avoid additional visual noise caused by too many lines in addition to the champion trajectories.The di erence in correctness between T1 and T2 could be explained by T1 requiring to observe the lines while T2 was related to icons (which are more visually salient) in a speci c location.
Contextualizing these results within the broader literature on storyline visualizations from other domains is di cult because a) work in this area has thus far focused primarily on algorithmic contributions and b) if user studies were included the underlying data, tasks, goals, or method of assessment were di erent (such as in the case of, e.g., [43,53,59]).Hence, we purposefully refrain from it here as such comparisons would be unfair or skewed.
In uence of the Number of Locations (RQ-2): Correctness and e ciency did not di er depending on the number of locations, neither at an individual task level nor at the task type level.This is contrary to our initial expectations as more locations yield smaller strips in the visualization and thus location membership of events and lines might be more di cult to infer (e.g., more lines being squeezed into a smaller area).However, in terms of overall visual density the number of locations has not had much e ect.Compare, for instance, Figure 4a with 4b -or, similarly Figure 4c with 4d -which in terms of density appear very similar.This is because the number of data points (positions, events) is basically the same.This is encouraging as it suggests that more locations do not necessarily impact the interpretability of the visualization which points to a certain scalability of the approach in this regard.Interestingly, the assessment of the visualization with respect to the six criteria outlined in Section 5 was also not signi cantly a ected by the number of locations.For clarity, readability, and aesthetic appeal this could potentially be explained by the above-mentioned similar visual density.In the case of informativeness, usefulness, and especially accurateness, the situation is, however, less clear.More locations mean more precise spatial information (i.e., less abstraction) on where the champions were located at a given point in time.Still, the visualizations using more locations did not score higher in terms of accurateness.No clear preference with respect to the number of locations could be observed as well, with both subdivisions being considered to provide an appropriate level of subdivision by two-thirds of the players in each condition.This could be, however, a result of the between subject-design as participants did not see the other version.
Visualization Design (RQ-3): Direct comparisons of the quality ratings with other gameplay visualizations are di cult due to di erences in their design.However, for the purpose of contextualization, it is worth noting that these are roughly in line with the ratings of map-based visualizations showing individual unit trajectories as reported by Wallner and Kriglstein [64].While the games in our and their study are di erent (LoL and World of Tanks) the number of units displayed is comparable.The low ratings with respect to readability and aesthetic appeal are also similar.We suspect this is due to the focus on individual units in both cases which can make the visualizations quickly feel cluttered to the reader.In our particular case, this matter was probably also exacerbated by the data itself.Since positions of all champions are recorded at regular intervals by the API (except events), the majority of changes between areas take place at the same time which leads to an accumulation of vertical lines at these time points which also hampered participants' ability to read the visualization as expressed in the qualitative feedback.While it could be argued that such issues could be circumvented by using more detailed data, our use case also illustrates the practical constraints which arise from relying on public APIs.Gameplay visualizations need to function within such limitations which makes it essential to study them with such real-world data.
Secondly, we kept the order of locations along the -axis xed to ensure consistent meaning across time (following the recommendation from Arendt and Pirrung [8]).However, this also yields less exibility in resolving edge crossings.In general, the ordering of locations is important to help reduce edge crossings and thus visual clutter.In other words, this consistency comes at the cost of aesthetics.Previous work on storyline visualizations (e.g., [57,59]) has established such aesthetic criteria which can be more easily met if the locations can be re-arranged over time.In addition, these aesthetics are also important for the interpretation, with the number of edge crossings being known to have a large e ect on the understandability of graphs (e.g., [44]).A comparison study in the gaming context in the vein of Arendt and Pirrung [8] would be necessary to shed further light on these trade-o s.In addition, a storyline visualization needs to map the locations somehow onto a one-dimensional axis.Since in our case the spatial neighborhood of locations on the twodimensional map has direct relevance for interpretation, it is considerably harder to meaningfully arrange them along the axis in xed order in a way that it re ects geographical proximity (especially when the number of locations grows).Participants' feedback also indicated that the ordering was not always logical for them.This is, for instance, di erent compared to plotting a movie's story where the geographical arrangement of the locations is not necessarily important.While Arendt and Pirrung [8] also used geo-temporal data of places visited by people, location changes are much more frequent in our case due to the fast-paced gameplay.As such the increased clutter caused by these two factors may o set the bene t of having xed and consistent -positions for the locations.As such, future work should investigate the impact of location ordering for spatial gameplay review as the expectations and requirements may di er from other domains.
With respect to LoL especially the respawns led to long vertical lines from the death location back to the base either at the top or bottom of the visualization.This could be circumvented by restarting the line at the base without drawing an explicit connection between the two areas.While many participants also wanted to see more data included in the visualization, this also relates to the general question of how much and which information does not need to be included.That is, how much information needs to be explicitly encoded or is implicitly governed by gameplay knowledge and thus could be left out without impacting the interpretability of the visualization (and how this di ers between novices and experts).
Design Considerations: To summarize, our ndings indicate that storylines visualizations can be a suitable means for spatio-temporal gameplay analysis.However, they also point to design-related improvements that could be made.In the following, we hence consolidate the ndings and present design considerations based on them.
• When tasks were harder and less e cient to solve than where and what tasks (RQ-1), which we suspect could have been partly caused by the less prominent visual representation of time compared to locations.As such we recommend to ensure that both dimensions have equal visual saliency in the storyline.• The number of locations did not a ect correctness nor e ciency (RQ-2) and had no statistically signi cant impact on the ratings.In particular, we could not observe any e ect on clarity, readability, and aesthetic appeal which points to a certain scalability of the visualization in terms of locations and its use for more ne-grained spatial analysis.However, we still recommend to keep the number of locations reasonable as more locations can result in more complex trajectories which, in turn, can a ect readability.In addition, logical geographical ordering of the locations becomes increasingly di cult.• Readability and aesthetic appeal scored lowest among our criteria (RQ-3) which can be primarily attributed to clutter which in our case originated from keeping the locations xed along the -axis and the regular sampling of the data.As such we recommend to avoid regular sampling of position information whenever possible to avoid accumulations of lines (and edge crossings) at speci c time points.Detecting semantic location changes from more ne-grained trajectory data should help to improve readability in this respect.
• Related to the above, respawns created additional lines which spanned a large vertical range, thus contributing further to line clutter.Not all location changes might be needed to be explicitly encoded because they follow directly from the gameplay rules (e.g., respawns).• Due to frequent location changes and thus increased likelihood of edge crossings xed -positions for locations might be suboptimal in the gaming context.However, it remains open whether the dynamic location arrangement over time of the traditional storyline visualizations -which would produce a more aesthetically pleasing layout -would be superior in terms of data interpretation.

Limitations and Future Work
In terms of limitations, the temporal resolution of the storylines was limited by the match information provided by the Riot API [48] which only records champion movement in 1-minute intervals.While some additional positions are provided by the events, the trajectories of the players were still only roughly reconstructed with ne-grained movement details not being captured by the visualization.On the other hand, increasing trajectory resolution too much can contribute to making the visualization more noisy and cluttered.However, participants also considered the visualization to have proper accuracy (cf. Figure 7) for capturing the general course of a match.That said, using replay data with better temporal resolution can still improve the visualization as it can be used to derive a simpli ed representation that includes important transitions between locations (which would perhaps otherwise not be captured).
We also focused on a subset of the recorded data, i.e., movement, kill events, and ghts.These were found to be among the most relevant features to include in visualizations of MOBA matches in a previous study by Wallner et al. [65] but are not the only ones.Not all these features lend themselves well to be displayed in a storyline visualization.However, others such as item use or leveling information could be easily added.Nevertheless, we deliberately kept the number of features small to avoid visual clutter and information overload [14].Such information could be added in the future by including ltering options.The LoL map also has well-de ned semantic areas of which the players are well aware of.In other games, where players may not have such a common understanding of locations, choosing meaningful semantic locations can become more challenging.
The method presented in this paper is suitable for visualizing post-match data.This has the bene t that the complete data is available and can be considered for layouting the storylines.Despite this, we can also envision its application to streaming data to construct the storylines in real time.This would allow application in live feeds to enhance spectatorship or in live companions for real-time training.However, this requires a di erent layout algorithm such as the one proposed by Tanahashi et al. [56] that updates the visualization incrementally while still considering the legibility of the layout.Another use case could be to combine it with video-on-demand reviews which are also commonly employed in esports (e.g., [37]).The storyline could act as a match summary, highlighting interesting situations and allowing users to jump directly to the respective parts in the recording, thus circumventing the time-consuming need to replay the whole recording.This way, the concise spatio-temporal overview could be combined with the deep contextual data provided by the videos.
In terms of study design, the study took place in a less controlled environment (online survey) which helped us to reach a wider audience but at the same time made response time measures less reliable.Using -scores outlier removal, we took measures to counteract the in uence of outliers as much as possible.However, especially long response times can be di cult to classify as discussed by Matjasic [35] because they can point to multitasking but also to a deeper engagement with the task.Lastly, the study only focused on evaluating how correctly and e ciently prede ned tasks could be conducted with the visualization.This is an important requirement to ensure that the visualization is understandable and users can gain higher-level insights (cf.[4]).As such future work will need to focus on studying insights players can derive from the visualization and how players will use them to improve their gameplay.

CONCLUSIONS
Storyline visualizations are unique in the sense that they depict the proximity of actors over space and time in a single view.In this paper, we adapted the storyline visualization technique to summarize the spatio-temporal evolution of LoL matches.With player communities having a high demand for tools that allow them to review and improve their gaming skills, our work contributes to recent research e orts related to player-facing analytics.Our study shows that the method holds promise for this application context with participants of our user study responding positively to the general concept.
Results indicate that questions about locations and events could generally be well-answered.However, questions about time were less easy to answer with comparable lower e ciency.The number of locations did not in uence correctness and e ciency as well as the subjective rating of the visualization, suggesting a certain scalability in this respect.However, logical ordering of the locations that re ects geographical adjacency becomes more and more challenging with an increasing number of locations.Several options for improving the design that, in turn, could increase correctness and e ciency further were presented.Notably, the xed ordering of locations shown bene cial for real-world spatial data may be suboptimal in the context of gameplay review and would warrant further research.We also see potential for including storyline visualizations into larger match analysis tools to provide an overview of the main ow of the match, pointing users to interesting situations for in-depth analysis.
Our study also raised an interesting point about leveraging the implicit gameplay knowledge of players to help simplify the design of spatio-temporal match visualizations, an issue that -to our knowledge -did not receive particular attention so far.Lastly, while we focused on match summarization and some adjustments were tailored towards our use case LoL we believe our results can extend to games with a similar number of players and locations and can also contribute to our understanding of the bene ts and drawbacks of storyline visualizations in general.Especially since work on storyline visualizations has taken a predominantly algorithmic perspective with user evaluations still being rare.

Fig. 2 .
Fig.2.Overview of the di erent elements of the storyline visualization with champion trajectories, locations, event icons, group fights, tooltips, e.g., for kills and areas, minimap with the spatial boundaries of the locations, champion icons, and timeline.

Fig. 3 .
Fig.3.a) Lines in a storyline visualization are interpolated for a smooth appearance which can lead to some event icons not being displayed in the correct location as in the case of the cyan skulls which should be in the grey areas but are in the white one.The API also does not record respawns, causing paths to proceed directly to the next recorded position or event instead of going back to the respective base.(b) Adding additional data points at the same location as the death event shortly before and a er the event creates an artificial 'plateau' in the line to ensure that the icon is displayed in the correct location.(c) Adding an additional data point with the base of the killed player as location shortly a er the death event ensures that the respawn is also reflected by the line .

Fig. 4 .
Fig. 4. The four storyline visualizations used as part of the study are based on two replays (R-1, R-2) and subdivide the map into two di erent number of locations (small = 7 locations, large = 17 locations).

Fig. 5 .
Fig. 5. (a) Average correctness, (b) time needed in seconds, (c) self-reported mental e ort on a scale from 1 (very very low mental e ort) to 9 (very very high mental e ort), and (d) e iciency for the three di erent types of questions for the stimuli with a small ( ) and large ( ) number of locations as well as across both ( ).Error bars show 95% confidence intervals.

Fig. 6 .
Fig.6.From top to bo om: average correctness, time needed in seconds, self-reported mental e ort on a scale from 1 (very very low mental e ort) to 9 (very very high mental e ort), and e iciency for the individual tasks used in the study.Measures are shown individually for the stimuli using a small ( ) and large ( ) number of locations as well as across both ( ).Error bars show 95% confidence intervals.

Table 1 .
Tasks used in the online survey, grouped by task category.( = where, = when, = what).