Enhancing Auto-Generated Baseball Highlights via Win Probability and Bias Injection Method

The automatic generation of sports highlight videos is emerging in both the sports entertainment domain and research community. Earlier methods for generating highlights rely on visual-audio cues or contextual cues, so they may not capture the overall flow of the game well. In this paper, we propose a technique based on Win Probability Added (WPA), an empirical sabermetric baseball statistic, to generate baseball highlights that can better reflect in-game dynamics. Additionally, we introduce methods for generating “biased” highlights toward one team by systematically manipulating WPAs. Through a mixed-method user study with 43 baseball enthusiasts, we found that participants evaluated WPA-based highlights more favorably than existing AI highlights. For (un)favorably biased highlights, the game result (win/loss) was the most dominating factor in user perception, but bias directions and strengths also had nuanced effects on them. Our work contributes to the development of automated tools for generating customized sports highlights.

dynamics.Additionally, we introduce methods for generating "biased" highlights toward one team by systematically manipulating WPAs.Through a mixed-method user study with 43 baseball enthusiasts, we found that participants evaluated WPA-based highlights more favorably than existing AI highlights.For (un)favorably biased highlights, the game result (win/loss) was the most dominating factor in user perception, but bias directions and strengths also had nuanced efects on them.Our work contributes to the development of automated tools for generating customized sports highlights.

INTRODUCTION
A sports highlight consists of the most important moments during a game.Sports highlights are a great way for people to catch up on notable plays from games they missed or did not have time to watch in full.A growing number of academic and industrial sectors have been developing automated sports highlight generation, focusing on sports such as soccer [8], cricket [36,37], and golf [20].For example, companies like IBM [32] and NAVER [28] have developed AI to generate highlight videos for tennis and baseball games, and REELY [31], Cognitive Mill [2], and Sizzle [38] cover a broader range of automated esports game highlights.These automated systems primarily draw from event-oriented features, using motion recognition and scoreboard detection [4,20,35], along with excitement indicators like audio intensity and live comment analysis [10,19].
Yet, existing methods of extracting important events have had limitations in refecting the overall fow of the game [13].While excitement features, based on audio intensity and live comment analysis, can spotlight distinct moments of heightened tension, they often can leave out the nuanced dynamics and tactical shifts that shape the larger narrative of the game.For example, object/eventoriented features from motion recognition usually focus on individual events, missing the broader context in which they occur.Similarly, although scoreboard detection emphasizes scoring plays, this method may fail to capture strategic maneuvers and defensive eforts that are integral to the game.For example, certain events that enhance the likelihood of winning, even without leading to a score, could provide a more comprehensive view of the game in generating sports highlights.
In baseball, such game-changer moments can be quantitatively detected with Win Probability Added (WPA).WPA is a metric in sabermetrics, a popular empirical statistical analysis grounded in historical baseball data [6,18,25].WPA quantifes the efect of each player's actions on their team's chances of winning, taking into account game circumstances like the score, inning, number of outs, and baserunners on the feld.As a result, signifcant shifts in WPA, as indicated by their absolute values, could imply the importance of the corresponding events within the game.Taking advantage of such characteristics of WPA, WPA can be an efective tool for identifying key moments in a game's narrative [27,44].Therefore, we propose the standard WPA-based approach for generating highlights that include key events with the highest WPA's absolute value.
Extending our standard WPA-based approach, we propose a method to inject bias into highlight videos that fans of a particular team could enjoy by systemically adjusting WPAs.In particular, we focused on the way sports fans with diverse allegiances perceive a game.For example, a closely contested victory may be exciting for supporters of the winning team but simultaneously frustrating for the fans of the losing team.Considering these contrasting experiences of fans of matching teams, some broadcasters and streamers have begun to provide intentionally biased commentary, allowing only fans of a particular team to enjoy biased commentary favoring their favorite teams [22].In spite of the potential of biased sports content, little is known about how to generate biased highlights, and users' perceptions of biased highlights remain largely underexplored.Therefore, we introduce a method of injecting intentional bias into baseball highlights by systematically modifying the weighting in WPA, which could emphasize the positive plays of one team over another.
The primary goal of this paper is to propose a method to generate baseball highlights based on WPA and a method for injecting bias into them.In particular, we implemented the standard WPA-based highlight generation method that selected the key events with the highest absolute WPAs.These selected events were then arranged in order of their occurrence and combined to form the video highlight.In addition, in generating biased highlights, we developed the method to adjust WPAs depending on "bias direction" (i.e., which team will have a favorable bias: favorite team vs. opposite team) and "bias strength" (i.e., how strong the adjustment would be: strongly weighted vs. weakly weighted).By systemically manipulating the WPAs by bias direction and bias strength conditions, the events with the largest adjusted WPA values were included in biased highlights.
Following the development of standard and biased WPA-based highlight generation methods, we conducted a mixed-method user study to explore how viewers perceive WPA-based highlights.A total of 43 baseball enthusiasts participated in the study, evaluating three types of highlights: existing AI-generated highlights, standard WPA highlights, and biased WPA highlights.To be specifc, we frst evaluated the general quality of standard WPA-based highlights.Participants watched two highlights from the same game; one was the highlight from the standard WPA method, and the other was the highlight from an existing AI highlight generation solution (NAVER AI highlights [28]), whose highlights mainly included scoring moments.We then explored users' perceptions of biased highlights by generating four types of biased highlights.These were generated by combining two factors: bias direction (favoring either the favorite team or the opposing team) and bias strength (either strong or weak).Participants were asked to view four types of biased highlights and rate them on comprehensiveness, interestingness, and fairness.Brief interviews were conducted after viewing and rating each highlight video to get a better understanding of their overall experience and impressions.Finally, we conducted semi-structured interviews to investigate participants' previous experiences and expectations regarding automatically generated highlights.
Through this user study, we aim to address the following research questions: • RQ 1.How do participants evaluate baseball highlights generated using the standard WPA-based approach, compared to existing AI highlights that mainly consisted of scoring events (NAVER)?
• RQ 2. How do participants evaluate biased highlights with diferent bias direction and strength conditions?• RQ 3. What expectations do participants have for improving auto-generated baseball highlights?Overall, our fndings revealed that standard WPA-based highlights were rated as more comprehensive and interesting than NAVER AI highlights.Participants reported that they felt the fow of the game better with the WPA-based highlights than with the existing solutions.In the evaluation of biased highlights, we found that the win/loss was the most dominating factor in user perception, but bias direction and strength also had a nuanced efect on users' perception.Finally, by investigating participants' previous experiences and expectations for auto-generated highlights, we found that users generally had disappointing experiences with existing AI highlights.At the same time, they held high expectations for the future of auto-generated highlights, emphasizing the need for personalized sports highlights accommodating their diverse needs and preferences.
As a whole, this paper contributes to the following: 1) Proposing a novel approach for the automatic generation of baseball highlight videos using Win Probability Added (WPA) and exploring the user perception of WPA-based baseball highlights.2) Introducing a bias injection framework and exploring the user perception of biased baseball highlights.3) Identifying the key aspects to generate automated highlights of sports, which could accommodate viewers' diverse preferences and needs.

BACKGROUND AND RELATED WORK 2.1 Automatic generation of sports highlight
There has been a growing interest in automated sports highlight videos in the sports entertainment industry and research community.Several AI highlights solutions have been applied across a variety of sports for their advantages in production speed and cost, such as for baseball [28,30], basketball [4], soccer [8,14], cricket [36,37], golf [20], tennis [32] and esports [2,10,19,31,38,41].
In order to automate sports highlight generation, AI techniques require the segmentation of the play scenes and the extraction of key events [3,7,9,36,37].In order to extract key moments that comprise automatic highlights, many sports highlight generation models have used visual-audio cues [4,20,35].From visual cues, methods such as motion recognition and scoreboard detection have been used to identify key events.In addition, audio cues from spectators and commentators have been utilized to identify and extract the most exciting moments in the game.Moreover, some other engines have included 'contextual' cues from outside the game, such as live comments [10,19] and tweets [14,43], to identify key moments where live viewers display a high level of interest.
However, although current methods for generating sports highlights using visual-audio and contextual cues are commonly employed, they may be constrained in accurately capturing the dynamics of the game [13].For example, while object-oriented features from motion recognition excelled at recognizing specifc on-feld events, they might fall short in capturing the narrative or the circumstances that led to or arose from those events.Similarly, relying on scoreboard detection primarily emphasizes scoring plays, potentially neglecting the critical build-ups, defensive strategies, or underlying intricacies that shape the game.Additionally, contextual features extracted from audio intensity efectively highlighted moments of heightened emotional response, such as roars, gasps, and impassioned commentary.However, these could inadvertently overlook the quieter yet pivotal instances where the tension slowly built up.To address these limitations, we propose a highlight generation method based on Win Probability Added (WPA) that could capture the game's fow and provide a more comprehensive view of key events.

Baseball informatics and WPA
To address the potential limitations of prevalent highlight generation methods, we consider a novel highlight generation method that utilizes sabermetrics.The concept of sabermetrics is the empirical study of baseball statistics that quantitatively measures in-game activity [6,18,25].Research in sabermetrics, which began in the mid-20th century, has been extensively used by baseball teams and in sports analytics.The broader audience became acquainted with sabermetrics, in part due to the book and subsequent flm, "Moneyball" [24].
Win Probability Added (WPA) is one of the sports sabermetrics considering the situation of a particular event, which can capture the salient moments of the game [27,40,44].WPA quantifes the importance of an event by measuring the diference of Win Expectancy (WE), the percent chance a particular team will win based on the score, inning, outs, runners on base [33,39].In understanding WPA, it would be essential to grasp the basic structure of baseball.The game is played over nine innings, with teams alternating between batting and felding.Each inning continues until the felding team secures three outs, achieved through various events like striking out the batter or catching a fy ball.Once a batter hits the ball and safely reaches a base, it becomes a runner, aiming to advance around the bases to score runs.Ultimately, the team with the most runs after nine innings would win.
Win Expectancy extends beyond simple tracking of the game's progression and current scoring situation.It provides a dynamic assessment of winning probabilities that takes into account the nuanced state of the game at any given moment.With its granularity, WPA is able to identify important events not only relating to scoring but also considering the larger context of the game at the exact moment of the event.For example, "caught stealing" (when a baserunner attempts to advance to the next base during a pitch but is tagged out before reaching it safely) can be considered pivotal, given its importance within its context.Recognizing its utility, the Major League Baseball (MLB) recently included Win Probability data within the "Key Moments" feature of their updated MLB.com website [1].This underscores MLB's efort to boost fan involvement by providing deeper insights into critical game moments.
With advancements in deep learning, research into predicting win probabilities has seen signifcant growth, especially in the feld of esports.Notable research has been conducted to develop comprehensive frameworks for jointly predicting win rates and performance in video games, aligning with the evolving needs of esports analytics [48].Additionally, a study has delved into real-time winning prediction in MOBA games, adopting a dual perspective approach that considers both the overall confrontation and individual movement dynamics [50].Another area of focus has been on utilizing machine learning-based approaches for predicting outcomes in popular games like League of Legends, involving detailed analyses of raw game data and innovative modeling techniques [16].There has also been research introducing the MassNE model, an innovative approach for predicting outcomes in massive battles (StarCraft), which takes into account the complex interactions between diferent squads [12].Additionally, the use of win probability calculation has been adopted in other traditional sports, including basketball [23], and volleyball [17].This broad spectrum of WP data underscores the versatility of WPA-based methods.
Although Win Probability Added (WPA) holds promise, it has not been widely utilized in the generation of sports highlights.An exception is a recent study that applied shifts in win-loss probabilities to identify pivotal moments in esports games [21].This study utilized a Multi-Layer Perceptron (MLP) model for win-loss classifcation to identify signifcant changes in win-loss probabilities and utilized these shifts to generate game highlights.Our study aims to build upon this work by demonstrating how WPA can be efectively utilized in generating baseball highlights.In particular, while the prior study in esports primarily focused on identifying win-loss probability changes at specifc moments, our research introduced a novel methodology by applying diferent weights to each team's WPA to generate biased highlights.This method could allow for a targeted emphasis on particular teams or players, thereby adding a new dimension to sports highlight generation methods.

AI and Bias
In AI-infused systems, bias often presents signifcant challenges, predominantly resulting in outcomes perceived as unfavorable or even discriminatory [47,49].With increasing concerns about AI bias, researchers have discussed various concepts of algorithmic fairness [26] and presented systematic methods for debiasing them [11].The focus on promoting AI fairness has been directed towards improving the trustworthiness and ensuring more equitable operation of AI-infused systems by actively reducing unintended biases.
There is also the risk of bias in auto-generated sports highlights.For example, audio cues such as crowd intensity may skew towards a particular side based on factors like the game location (home or away) or the popularity of certain players or teams, regardless of the content of the game.Additionally, relying on audience chat reactions to identify key moments could lead to an imbalance in chat volume due to the popularity of certain teams, potentially leading to highlights with signifcant bias [10].To counteract biases stemming from such skewed cues during key moment selection, leveraging the WPA-based highlight generation approach could help generate a more reliable and comprehensive highlight.
At the same time, in the context of sports viewing, some biases may be perceived naturally and even positively.For example, a recent study discovered that there is a clear demand for sports broadcasts and commentary that exhibit bias in favor of a viewer's preferred team or player [22].Outside of sports contexts, several studies have also revealed people's general preference for AI outcomes that favor their own interests.For example, Wang and colleagues found that people rated algorithms as more fair when AI's predictions were favorable to themselves, even when they recognized the algorithms' inherent biases [49].Despite the potential that bias in AI-generated sports highlights could be perceived positively, little research has explored the perceptions of biases in AI sports highlights.

WPA-BASED BASEBALL HIGHLIGHT GENERATION METHOD 3.1 Extract Important Plays using WPA
As a way of identifying key moments in sports games and generating automatic highlights, we propose using WPA, a metric of sabermetrics, the statistical analysis branch specifcally tailored for baseball records.WPA is a measure of how much a player's actions increase or decrease the chances of their team winning.WPA is calculated based on win expectancy (WE)-a team's probability of winning based on each play.WPA is the WE diference between two plate appearances.Specifcally, a plate appearance encompasses the fne-grained situation detailing the pitcher, batter, and the statuses of frst, second, and third bases.Consequently, events such as hits (when a batter successfully strikes the ball and reaches at least frst base without getting out), fyouts (batters hitting the ball in the air and it being caught for an out), strikeouts (the batter is dismissed after failing to hit three pitches), and walks (the batter advances to frst base due to four pitches being outside the strike zone) are considered as crucial moments since they typically result in a change of the situation on the bases.Accordingly, each of these events has a designated win expectancy.WPA can have either positive or negative values depending on the change in win expectancy.Positive WPA means the event increases the odds of their team winning, and negative is vice versa.In the calculation of WPA, it factors in not just the scores but also the detailed situation at the time of the event.Consequently, the WPA value can difer for the same event, depending on the specifc circumstances at that moment.
In Figure 2, we present two scenarios that showcase how the WPA can vary for the same event (a home run by the home team) under diferent conditions.A home run occurs when the batter hits the ball into fair territory and manages to circle all the bases back to home plate.This action also enables any baserunners on the feld to advance and score, as depicted by the orange diamond in Figure 2. In Case 1, the home team, behind by one point, hits a two-run home run with a runner on frst base, changing the score to 2:3.Case 2 depicts a grand slam-when all three bases are occupied, and the batter hits a home run, leading to four points scored and a new score of 0:8.Though both instances involve home runs, the circumstances and their impacts on the game's results are signifcantly diferent.The home run in Case 2, scoring more points than in Case 1, occurs when the home team is already comfortably ahead.However, in Case 1, the home run occurs under a tighter situation, so its impact on the game's momentum is more signifcant.Therefore, Case 1's WPA (.186) is almost twice as high as Case 2's (.096).This comparison emphasizes the usefulness of the WPA as a context-sensitive statistic for identifying key events.The algorithm for selecting key events based on WPA is as follows (see Table 1).We calculated the WPAs of all events of a game and converted WPAs into absolute values (lines 3-5).In order to determine the most crucial events within a game, events are sorted by their absolute WPA values (line 6).Then, events with higher absolute WPAs are included as segmented scenes, and those scenes are positioned according to the fow of a game (lines 7-12).Depending on the length of the highlight videos, the number of selected events can be diferent.Additionally, we include the lastout scene (the fnal out recorded in a game, marking its conclusion) since this is considered a standard for baseball highlights (line 13).

Injecting biases into Highlights
In addition to the standard method for generating baseball highlights using the absolute value of WPA, we propose a bias injection framework by weighting WPAs of each team.Our bias injection method works to make a certain team's positive plays (e.g., hits, home runs, strikeout) more important while diminishing the importance of negative plays (e.g., base on balls, walks).In addition, the strength of the bias can be calibrated, enabling the customization of the desired bias intensity to align with specifc user preferences.
The method of giving weight to WPA is as follows (see Table 2).To generate biased highlights in favor of one's favorite team, our bias injection model multiplies the value greater than 1 to increase In the case of negative WPAs, it multiplies the value less than 1 to decrease the absolute value (lines 10-12).Conversely, for the opposite team, the model multiplies the value less than 1 to the positive WPAs to reduce their absolute values and multiplies the value greater than 1 to the negative WPAs (lines [13][14][15][16][17][18][19].Using these adjusted WPAs, we reorder the WPAs and select the scenes for biased highlights (lines 21-29).

EVALUATION METHODS 4.1 Constructing Test sets
To evaluate the efectiveness and perception of our WPA-based methods, we generated highlights of 48 baseball games in the Korea Pro-Baseball League (KBO, Korea Baseball organization).KBO league is the premier level of professional baseball league in South Korea, consisting of 10 teams.As one of the most popular sports in sort out in order of occurrence time (the original event sequence) return ; South Korea, KBO games receive extensive global coverage, including broadcasting by ESPN [34].
To address potential efects that might arise from participants' prior familiarity with the games used in the study, we strategically selected games from September 2019, which was about three years prior to our experiment.This decision was taken with the assumption that the time elapsed would render these games less memorable to the participants, thereby minimizing the infuence of prior exposure.A total of 48 games from this period were collected and used for generating the highlights.In addition, we also collected available AI highlight videos of these 48 games, provided by the major online sports news platforms in South Korea (NAVER) using their own AI solutions-mainly targeting scoring moments in highlights-for comparison [28].
We then generated highlights of the 48 games based on our WPA-based highlight generation methods.Five highlights for each game were generated, one generated with the standard WPA-based method and others with the bias injection method: 4 bias conditions (2 bias directions)×(2 bias strength).There are two bias directions: (biased to the home team vs. biased to the away team) and two types of bias strength: (strongly biased vs. weakly biased).Consequently, six highlights for each game, including the one generated by NAVER, a total of 288 (48x6) highlights were constructed for the evaluation.
To generate WPA-based highlights, WPAs were calculated using the ofcial game records, which consist of the event label, situation, and time information.Then, the standard WPA-based method (Table 1) was used to generate neutral highlights, and the bias injection method (Table 2) was used for biased highlights.In the standard WPA-based method, highlights were generated using the absolute values of the WPA.For the biased highlights, diferent weights were applied to these WPA values according to each bias condition.We empirically determined two levels of bias strengths (strong/weak).For strong bias, WPA increases were doubled and decreases halved, whereas for weak bias, increases were multiplied by 1.5, and decreases were reduced to 0.75 of the original value.
Further, we matched the length of these highlights with those provided by NAVER for comparison.The average length of generated highlights () was about 4 minutes (M= 4:04, SD= 1:07).As there are six highlights for one game, the length of the highlights for each game ( ) was made as similar as possible.As a result, the diference in length between the highlights of each game was only a few seconds.For example, the length of NAVER highlights and generated highlights were nearly the same ( 2 = 0.94).The average standard deviation of highlight lengths for each game ( ( )) was 8.63 seconds (minimum 0.76 seconds, maximum 20.89 seconds).
The bias in favor of a particular team in our generated highlights could be demonstrated by examining the proportion of each team's positive plays.The classifcation of an event as positive or negative depended on its WPA sign: a positive WPA indicates a positive play, whereas a negative WPA implies a negative play.Through the ratio of positive plays and the time allocation to them in generated highlights, we could deduce how bias direction and strength conditions were applied.The positive event ratio of a team in highlights was calculated as follows: And time proportion was calculated as follows: = ℎℎℎ PUT into ; (insert last out event) (29) sort out in order of occurrence time (the original event sequence) (30) return ; In calculating these, we made comparisons based on the fnal

Participants
outcomes of the games, specifcally between the winning and losing teams.In Figure 4, the graph displayed the percentage of positive plays made by the winning team, represented by the color blue.For example, when the highlights were strongly biased toward the winning team, their positive plays occupied a signifcantly bigger portion compared to the NAVER ones.In contrast, when strongly biased towards the losing team, the positive play proportion of the winning team signifcantly decreased, while that of the losing team noticeably increased.This illustrated that our bias injection methods efectively refected the intended bias directions and strengths as demonstrated by the proportion of a certain team's positive plays varied by each bias condition and game results.
We recruited study participants through the posting of university community websites.Participants in the study were required to be regular viewers of KBO games and possess a strong fan allegiance to one of the KBO teams.We recruited 43 participants (30 males and 13 females) with an average age of 29.0 (SD= 5.99).The procedure was conducted online via Zoom for about 45 minutes, and each participant received a gift voucher worth $10.The evaluation protocol was reviewed and approved by the Institutional Review Board of the university where the study was conducted.

Task
As shown in Figure 1, our evaluation involved participants reviewing a set of 6 highlights categorized into three pairs: were for evaluating the bias injection method.A pair consisted of highlights from two diferent versions of the same game.Before the experiment, participants declared their favorite team.With this data, games were randomly selected based on the condition for each pair.
To evaluate the standard WPA-based method, one game in which their favorite team did not participate was selected to control the efects of their fan allegiance to a team.Therefore, in Pair 1, the commercial solution (NAVER) and neutral highlight generated by the standard WPA method for the same game were selected.
Next, to evaluate the bias injection method, two games of their favorite team were selected.In this case, Pair 2 consisted of two highlights strongly biased towards the favorite or opposite team.Pair 3 included two highlights weakly biased towards the favorite or opposite team.
To control for the order efect, we randomly arranged the two highlights within each pair.Additionally, the sequence of the pairs themselves was also randomized.As each pair contained two highlights derived from a single game, we ensured that participants viewed them consecutively to maintain the context of the game.

Procedures
In the experiments, participants received the Zoom link and joined the call.Following an explanation of the experiment procedure and signing the consent forms, participants watched six highlights.As previously described, the two highlights within each pair were randomly arranged, and the sequence of the pairs themselves was also randomized.To ensure the integrity of the study and to minimize bias, experimental conditions for each highlight video were not disclosed to participants.After watching each highlight, participants were asked to answer the same set of survey questions to rate its comprehensiveness, interestingess, and fairness.Following that, they were briefy interviewed about their overall impressions and experience watching each highlight.At the end of the experiment, they were asked to take part in semi-structured interviews to discuss prior experiences and expectations of auto-generated highlights.The average duration of interviews was about 13 minutes (M=13:28, Md=12:36, SD=3:58).Overall, each experimental session lasted about 45 minutes.

Survey Measures
In order to understand user perception of the highlight in a multifaceted way, the questionnaire consisting of 16 adjectives was designed based on prior research.As there were no standardized multifaceted evaluation criteria specifcally for sports highlights, we adopted the semantic diferential scale (18 adjectives) for evaluating news articles proposed by Sundar [42] and the scale (6 adjectives) for assessing video summarization from He et al. [15].Similar to news articles, sports highlights are required to present key events and information in a constrained format.Furthermore, both sports highlights and video summarization aim to condense extended sequences into shorter versions, ensuring that key moments are included while preserving a coherent narrative fow of video content.Considering these similarities, we adapted these criteria for evaluating baseball highlights.After all, we refned our evaluation criteria to 16 items by eliminating duplicates (i.e., clear, concise, coherent, informative) and items not meaningfully related to sports highlights (e.g., believable, sensational).Some items were modifed to suit sports highlights (e.g., "well written" to "well made").The fnal questionnaire items are shown in table 3.In this study, users evaluated each highlight on the survey with a 7-point Likert scale, ranging from highly disagree (=1) to highly agree (=7).

Interview Procedures
Participants were asked to take part in brief interviews after watching each highlight and exit interview at the end of the experiment.
In the interviews conducted after participants had watched each highlight, participants were frst encouraged to express their overall impressions and thoughts about the highlight freely to gain insights into what they liked or disliked about the highlight they had just watched.Next, participants were asked whether there were any specifc survey items with which they strongly agreed or disagreed and were asked to elaborate on their reasons for their ratings.Additionally, they were encouraged to articulate if they noticed any diferences between two highlights in a pair.In the exit interview, we asked participants about their past experiences with existing auto-generated highlights, focusing on their expectations for such technology and any limitations they perceived.

Qantitative Analysis.
Participants rated each highlight via survey questions that contained sixteen, 7-point Likert scales shown in table 3. Factor analysis with Varimax rotation indicated the presence of three factors, one corresponding to Comprehensiveness (This highlight video "contains all key points, " "is comprehensive, " "accurate, " and "I don't need to watch the entire game video because of this highlight") and another corresponding to Interestingness (This highlight video is "boring (reversed)," "pleasing," and "interesting") and the other to Fairness (This highlight video is "biased (reversed), " "fair, " and "objective").These factors accounted for 45.4%, 11.3%, and 8.6% of the variance, respectively.The remaining six items (well-made, lively, satisfactory, concise, coherent, clear) were excluded because they either loaded on multiple factors or loaded weakly on some factors (cutof: .70).By averaging the items loading on each factor, we created measures of perceptions on comprehensiveness (Cronbach's = .85),interestingness (Cronbach's = .79)and fairness(Cronbach's = .88).
For the quantitative analysis of RQ1 (comparisons between standard WPA vs. commercial solution), we conducted a paired t-test on each of the above measures.For the analysis of RQ2 (comparisons between four bias conditions), we conducted paired t-tests to examine the main efects of the bias direction (favorite vs. opposite team) and bias strength (weak vs. strong).As a result of the nature of the baseball games and our test sets, the game results (a result of the game: favorite team win vs. loss) could not be completely balanced, unlike bias direction and strength.Therefore, we used independent t-tests to examine the efect of the game outcome.For further investigation of potential interaction efects, additional paired t-tests and two-sample t-tests were conducted according to each condition.To counteract the multiple comparison problem, Bonferroni correction was applied.In addition, the efect size for each respective comparison was quantifed through the calculation of Cohen's d.Generally, Cohen's d value of 0.2 is considered small, 0.5 medium, and 0.8 large [5].Finally, while the sequence of viewing highlights was randomized to mitigate potential order efects, as detailed in Section 4.4, there was no signifcant efect of the order on all measures (p > .05respectively).Therefore, we did not include the order in our analysis models.

Qalitative Analysis.
To understand user perception of each highlight and identify suggestions for the improvement of automated highlight generation methods, we analyzed participants' interview transcripts after fully transcribing the audio fles.The audio fles from the interviews were transcribed using Clova Note [29], a widely used transcription software.As this tool not only automated the initial transcription process but also allowed for manual review and editing, each transcript was meticulously checked and revised by the lead author, ensuring accuracy and familiarity with the data.Using thematic analysis [45], two authors independently examined the transcript data to develop initial codes.Each author separately scrutinized the transcribed data, creating initial codes from the transcripts broken down into individual sentences, resulting in 1513 initial codes.Following this, these initial codes were collectively reviewed by the research team.We focused on clustering similar codes and addressing any discrepancies in code naming, ensuring coherence in the coding process.As we continued refning and categorizing these code clusters, four high-level themes emerged: "Previous Experience with Automated Highlights, " "Conceptualization of Fairness, " "Evaluation of WPA-Based Highlights, " and "Expectations for Automated Baseball Highlights." The second part of the user study examined the impact of biased highlights varying by bias strength, direction, and win/loss of participant's favorite team on their evaluations.To examine the efects of bias direction (bias toward one's favorite team vs. opposing team) and bias strength (strong vs. weak) on participants' perceptions of baseball highlights, we ran a series of paired t-tests.As a result of the nature of the baseball games and our test sets, the win/loss could not be completely balanced, unlike bias direction and strength.Therefore, we used independent t-tests to examine the efect of the game outcome.

Main Efect.
As in Table4 and Figure6, the win/loss had a signifcant efect on comprehensiveness (t(170)=3.20,p < .01,Cohen's d=.49) and interestingness (t(170)=4.36,p < .001,Cohen's d=.66).Participants perceived the highlights videos as more comprehensive for their favorite team's winning games (M=4.56,SD=1.34) than losing games (M=3.85,SD=1.54).Similarly, they found highlight videos of the games their favorite team won (M=4.80,SD=1.16) more interesting than those of the games they lost (M=3.98,SD=1.31).This suggested that the outcomes of sports matches impart signifcant personal meaning to fans' experiences and perceptions, indicating that victories are associated with positive emotions, potentially elevating fans' satisfaction levels.However, for the fairness evaluation over highlights videos, win/loss had no signifcant efect (t(170)=.43,p > .05).Further, there was no signifcant main efect of the bias direction or strength on all evaluation metrics (p > .05,respectively), nor was there a signifcant interaction efect of the bias direction and strength on all evaluation metrics (p > .05).

Bias Direction & Win/Loss Interaction.
Focusing on the win/loss that had signifcant main efects on multiple measures, we further investigated the potential interaction efects between factors by running a series of simple efect analyses.A simple efects analysis allowed us to break down interactions by examining the efect of each independent variable at each level of the other independent variable.There were 4 cases (2 bias direction x 2 game outcome-win/loss), and the number of possible combinations that can be made by selecting two items from a set of 4 cases is 4 2 = 6.We used the Bonferroni correction method

QUANTITATIVE FINDINGS
The quantitative fndings had two parts.The frst part compared the evaluation of baseball highlights generated by the standard WPAbased method to the existing AI highlights (RQ1).This analysis was performed by paired t-tests to measure comprehensiveness, interestingness, and fairness ratings.Overall, our fndings revealed that standard WPA-based highlights were rated as more comprehensive and interesting than the existing AI ones (NAVER).
The second part examined the impact of the bias strength and direction and a favorite team's game outcome on their evaluations of highlights (RQ2).We further investigated the potential interaction efects between conditions by running a series of simple efect analyses.Our fndings suggest that while the win/loss outcome prominently infuenced user perceptions, the direction and strength of bias subtly shaped these views as well.

Evaluations on the standard WPA-based method (RQ 1)
The frst part of our user study focused on comparing baseball highlights produced via the standard WPA-based method with those from existing AI solutions.A series of paired t-tests were conducted to compare comprehensiveness, interestingness, and fairness ratings over the standard WPA-based highlights and commercial solutions (NAVER).As shown in Figure 5, there was a signifcant diference in the comprehensiveness ratings between the highlights using the standard WPA-based method (M=4.57,SD=1.31) and commercial solution (M=3.48,SD=1.42); t(42)=4.27,p < .001,Cohen's d=.79.Similarly, there was a signifcant diference in the interestingness ratings between the highlights using the standard WPA-based method (M=4.61,SD=1.21) and commercial solution (M=4.10,SD=1.37); t(42)=-2.98,p < .01,Cohen's d=.39.However, there was no signifcant diference between WPA-based (M=4.73,SD=1.48) and commercial solution (M=4.50,SD=1.51) highlights on the fairness ratings; t(42)=.81,p > .05.

Evaluations on biased baseball highlights
based on bias direction, strength, and win/loss (RQ2)  We examined how the favorite team's win/loss interacted with the bias direction condition.We found that the interaction efect was only found in the comprehensiveness ratings.As in Figure 7, when participants viewed the baseball highlight videos were biased toward their favorite team, they perceived highlight videos as signifcantly more comprehensive when their favorite team won (a : M=4.61, SD=1.38) than when they lost (c : M=3.74, SD=1.48); t(84)=2.78,p < .008,Cohen's d=.60.In contrast, when the highlight videos were biased towards the opposite team, the diference between winning conditions on highlights comprehensiveness perception was not signifcant (b-d : t(84)=1.73,p > .008).This result indicated the nuanced infuence of a participant's favorite team's game outcome (win or loss) on their perception of biased highlights.For the games that the favorite team won, participants tended to rate the comprehensiveness of the highlights more favorably, even when these biased highlights disproportionately featured less signifcant events in the broader context of the game.This fnding suggested that fans' perception of content comprehensiveness was intricately linked to their emotional engagement with the game's outcome.It appeared that a victory by the favorite team not only enhanced the perceived value of the highlights but also might have colored the viewers' perception of the event's comprehensiveness, regardless of the actual content.

Bias Strength & Win/Loss Interaction.
We examined how the favorite team's win/loss interacted with bias strength conditions regardless of bias directions.As in Figure 8, when participants viewed the strongly biased highlight videos, they perceived the highlights as signifcantly more comprehensive when their favorite team won (a : M=4.66, SD=1.31) than when they lost (c : M=3.64, SD=1.69); t(84)=3.13,p < .008,Cohen's d=.68.In contrast, when they viewed the weakly biased highlights, the diference between winning conditions on highlights comprehensiveness perception was not signifcant (b-d : t(84)=1.35,p > .008).
These results underscored the nuanced role that bias strength played in shaping sports fans' content perceptions, particularly in relation to their favorite team's performance outcomes.Strong biases were seen to both enhance and detract from the viewing experience, depending on whether the favorite team won or lost.This highlighted the delicate balance content creators must navigate when presenting biased highlights to a diverse audience with varying loyalties.

QUALITATIVE FINDINGS
Through the analysis of transcripts from both the brief interviews conducted after watching each highlight and the exit interviews, we identifed four key fndings.These included participants' "preexisting perceptions and experiences with existing automated baseball highlights, " "conceptualization of fairness in automated baseball highlights," "evaluations of WPA-based baseball highlights," and "suggestions for improving automated baseball highlights."

Pre-existing perceptions and experiences with conventional automated baseball highlights
The majority of participants (n=30) had been aware of the AI highlights and had watched them several times before the study.They said they watched AI highlights because they were usually available almost immediately after the games, whereas human-made highlights required a certain amount of processing time.P20 said "People who make highlights need time to edit, so they upload highlights 2 or 3 hours after the game.However, AI highlights are uploaded quickly, so I watched them a few times." P3 added "The reason I tried watching AI highlights is that it is usually very fast.Since the AI highlight could be uploaded almost immediately after the game, I always had expectations that the highlights would be uploaded soon." However, most participants who had experienced watching automatically generated highlights reported feeling disappointed (n=28), and they said they rarely saw them anymore.They said that they would rather wait longer for human-edited highlights, as they found them to be better than AI highlights.P8 said "At the end of the game, AI highlights are uploaded frst and followed by human-made highlights by the editor.The AI highlight, however, I found a little awkward.Even if it comes up later, I prefer to watch the human-made highlights." The reason they expressed a preference for human-edited highlights was primarily because they felt that AI highlights often overlooked vital aspects of the game's progression, focusing solely on scoring scenes.P2 commented that "As soon as the game is over, the [AI] highlights are uploaded quickly.It's defnitely fast, but to be honest, I don't expect much.I think baseball is a game of fow.Highlights should have included scenes where the fow changes dramatically.However, AI highlights often miss such crucial scenes.For example, I think one point in an early inning is much more important when the game is tight than scoring three runs after the score gap has already widened.However, there seems to be a tendency for AI to highlight show scenes that do not need to be shown."P15 added that "Once I saw an AI highlight and a human-made highlight uploaded, I had watched both once.After that experience, I never went back to watching AI highlights.In comparison to the ones made by humans, I found that many parts that I considered important were missing [in the AI one]." In addition, in terms of delivering the excitement of the game, some participants reported that human-made highlights were better than AI ones.P10 said "I think the feeling of excitement is a little less in AI highlights.[Human-made] highlights often include something like noise of crowd cheering, booing, vivid sound when a bat strikes the ball and so on." Based on these experiences shared by participants regarding existing AI-generated baseball highlights, we could observe that there was a clear demand for highlights that authentically capture the essence and fow of the game.Yet, the current AI highlights, which are predominantly focused on scoring, demonstrated a need for enhancement to more efectively represent key aspects of the game's fow.

Conceptualization of fairness in automated baseball highlights
In the study, one interesting fnding was that participants had diferent conceptions of fairness and bias when it came to baseball highlights.When answering items corresponding to fairness after watching each highlight, participants perceived fairness or bias in diferent ways.About half of the participants (n=23) recognized it as an equal proportion of positive plays made by two teams in the composition of the highlights.For example, P32 said "I felt that this highlight was biased a lot since only the [Team A]'s hitting and scoring scenes continued to appear.If the highlight showed the atmosphere of the [Team B] coaches and the atmosphere of the dugout, I would have known more about [Team B]'s eforts to get out of this crisis.But it's a pity that it wasn't."Yet, they also acknowledged that it would be difcult for the highlights to be perfectly fair if the game was too one-sided.P15 commented "More or less, highlights should have focused on the winning team.Since it was a one-sided game, I think there was a slight bias towards the side that scored more runs.So, in terms of proportion, it seems inevitable that it was a little bit out of balance." Similarly, P9 added "In a game that is led by one team, there should be more scenes for the winning team.It was difcult to determine whether this was due to the biased AI or it was just the way the game went." Another group of participants (n=16) saw fairness as whether the composition of the highlight was adequately addressing all the features of the game.They thought fair highlights should include the scoring scene, pitching, skillful defenses as well as interesting scenes in the fow of the game even though they were not directly related to scoring.P7 said "The highlight was so biased towards the scoring scene, so I couldn't fgure out how the pitcher was running the game." This perspective linked their understanding of fairness closely with the concept of comprehensiveness in baseball highlights.
Therefore, these fndings underscored the importance of considering the subjective nature of fairness in the generation and evaluation of automated sports highlights, indicating that what was considered 'fair' varied widely among viewers.This realization called for a more nuanced approach to designing automated highlight generation systems, one that accommodated diverse fan perspectives to enhance the viewing experience for a broader audience.

Evaluations on WPA-based Baseball Highlights
Overall, WPA-based highlight videos were generally rated favorably by participants.Even though we did not disclose the specifc conditions under which each highlight was generated, they found that WPA-based highlights contained the fow of the game better than NAVER ones.
In the interviews conducted after they saw highlight videos in Pair 1 (Commercial Solution (NAVER) vs. WPA-based highlight (neutral)), over half of the participants (n=24) reported that they experienced a better sense of the game's fow with the WPA-based highlights compared to NAVER ones.P4 said "I think they [two highlights, generated by WPA-based method and existing method] are quite diferent.In the case of this [WPA-based] highlight, I think it was impressive to contain a little bit of an intermediate process, such as the process of creating a chance." P34 also added "This [WPA-based highlight] video was satisfying because I came to know the overall fow, including a thrilling chase in the later innings, although the game didn't turn around, though.I think that this is a video that serves its original function as a baseball highlight." Further, P40 said "I found this video [WPA-based highlight] more interesting and less boring than the previous one [NAVER highlight] because it has more important details.It has detailed content about the bottom of the 3rd inning, the most important moment of the game." The results of the quantitative study showed that the results of the game results heavily infuenced participants' evaluations of highlight videos -win or loss of the participants' favorite team.The following sections presented how the game results profoundly afected participants' perceptions of WPA-based highlights.

6.3.1
We Won, We Won! Overall, highlights tended to be positively evaluated in terms of comprehensiveness or interestingness when participants' favorite teams won.Regardless of the bias direction, participants gave favorable ratings to the highlight video when their team won the game.Although we did not reveal our study conditions to participants, most participants (n=38) could recognize the highlight videos biased towards one team over another: "Personally, I feel this video is a little bit biased toward the [Team C]" (P36).
Even when they clearly noticed that the video was biased toward their favorite team, participants enjoyed watching it when their team won.In these cases, participants felt that it was comprehensive since it included more important points that contributed to the winning of their team.P15 commented "I'm not sure if it is because I'm an [Team D] fan, but there are a number of nice [Team D] plays in the highlight, which I think is great.So, I thought this highlight was well made.I think that the other team's play seems to be slightly under-represented, but I personally felt better." When their team won the game, even though the bias of the highlight was towards the other team, participants also said that it was good to have a better understanding of the overall game fow.For example, P15 said that "This one has more defensive plays as well as how we got outs.I feel this one is more lively and well-digested."On the other hand, some participants like P5 felt that the highlight of this case often contained unnecessary scenes: "I think it was very biased towards the opposing team.It wasn't very interesting because some scoring plays seem to be forced to be included in the video just to show that it is fair." 6.3.2Feeling the Hurt of Defeat, Yet We Put up a Good Fight.In contrast, participants generally gave less favorable ratings for lost games, even when the highlights were biased toward their favorite team.For example, P20 said "I think this highlight is clearly focused on my team.It was a shutout game by the opposing pitcher.[...] My team lost the game, so I cannot rate it highly for enjoyment, interest, or satisfaction." However, a smaller number of participants (n=5) found the highlights toward their favorite team to be more satisfactory and enjoyable since the videos showed the eforts of their favorite team.P43 said "I think it is interesting since it is trying to show the progress of the game comprehensively." P42 added "When I watched this highlight, I could at least fgure out that we lost but put up a good fght.So, I personally think this one is better." When the highlights were biased toward their opposite team and their team lost a game, it was common for participants to give low ratings.P43 said that "I think that this video was made with a big bias towards the winning team.For the winning team, this would be a great highlight.For the losing team, this is a highlight that really upsets me." However, even in this case, some participants thought that the unfavorably biased videos contained key plays of the game, so they perceived them it as more comprehensive.For example, P11 said that "Compared to the previous highlight, this one has more crucial plays.I think this one is much better.Of course, it hurts when my team loses.But this one includes important moments of the game."

Towards a better automated baseball highlight
Through the exit interview, participants gave us feedback on how to improve the quality of automated highlight videos, making them more enjoyable and providing a more personalized experience.Even though they reported not liking the existing automated highlights, participants said that they would be willing to watch it if the generation method improved.Particularly, they expressed varied opinions regarding the direction of personalization.Some wanted to see every play of their favorite player: "I'm looking forward to things to see the entire play of [Baseball Player A] in the game.Since I am a fan of his playing, AI could generate such videos regardless of whether he scored or not" (P2).Also, others wished to see the positive plays of their favorite team from diferent angles.Specifcally, they said it would be preferable if the AI could edit the highlight in a way that made them feel less bad for the game when their team lost: "If AI can, for example, show highlights related to my team regardless of the win or loss, people would not feel bad even if their team loses" (P6).Moreover, some participants suggested that fundamental structural details should be incorporated, as they considered them crucial for comprehending the overall game.This includes information such as each team's lineup, pitching changes, and so on: "I think the highlight should include lineup frst no matter what.I also think that pitching changes should be there, too." (P12).In addition, some participants suggested that the frst scoring run or the game-ending play would be essential even if the play had no efect on the win/loss.P1 said "For whatever highlight, I think the frst run and the gamewinning run should be shown." Another opinion was that it would be good if the AI could catch well in a specifc game pattern, such as a pitcher's duel: "I wonder if AI could translate the game when the game was a pitcher's duel.It appears that AI only focuses on the batter going to the base.I hope it also captures good scenes of pitchers" (P26).There was also a demand for interesting scenes that did not relate to the game.P3 said "It is missing some fun elements such as giving a child a foul ball, trying to catch a home run ball." Lastly, there was a technical request for the AI highlight to be more natural.Specifcally, P12 mentioned that "It still seems a bit awkward when the screen transition happens.If improved, I think it will be a lot more fun to watch."Furthermore, P10 requested a customizable highlight generation UI: "It would be nice if there was a technology to generate such a highlight where the user could select what they want to see.For example, I might only want to see the plays, or I might want to see the audience's reaction as well."

DISCUSSION
In this paper, we introduced a method of automatic generation of highlights using WPA and a method of bias injection into highlights.With user study, we evaluated our methods in terms of comprehensiveness, interestingness, and fairness.The frst part of the study showed the general efectiveness of the WPA-based method compared to the available commercial solutions (NAVER).From the second part, our results found that the win/loss heavily afected the evaluation of biased highlights.Lastly, we explored participants' suggestions regarding auto-generated highlights through exit interviews.Based on our fndings, we discuss implications for the automatic generation of highlights, focusing on the potentials of the WPA-based method, a bias in sports highlights, and further development of auto-generated highlights.

Potentials of WPA-based highlights Generation
In our research, we introduced an automated method for generating highlights based on WPA and validated its efectiveness through both quantitative and qualitative assessments.When compared to one of the most popular AI highlight solutions for baseball in South Korea (i.e., NAVER AI highlights), the highlights produced by our WPA-based method were perceived as more comprehensive and interesting because they adeptly captured the game's fow.While there are numerous efective approaches for automatically generating highlights using various audio-visual cues and deep-learning techniques, they often fail to adequately convey the overall game fow [13].As reported in the qualitative fndings, many participants felt that current AI solutions did not meet their expectations, primarily due to their inability to capture the game's fow.Overall, the fndings suggested that our WPA-based baseball highlight generation methods could better portray the overall game fow, leading to enhanced viewer satisfaction.
To apply our approach to other domains, it would be crucial to have historical statistics that can identify signifcant shifts in win probabilities.In this study, we used Win Probability Added (WPA) from sabermetrics, a metric specifc to baseball.Nevertheless, the fundamental principles of WPA could be adapted to other areas where win probabilities can be quantifed.Many sports, including basketball [23], and volleyball [17] and esports [12,48,50], have already incorporated similar frameworks where winning probabilities are calculated.Further, there are various sport-specifc statistics analogous to sabermetrics, such as APBRmetrics in basketball [23] and volleymetrics in volleyball [17].Given the abundance of data in various sports domains, our highlight generation methods could be further extended and improved through sophisticated statistical analysis and deep learning techniques.

Win/Loss vs. Bias in Highlights
In addition to the standard method for generating baseball highlights using WPA, we proposed a bias injection framework by adjusting weight to the WPA values according to diferent bias conditions.This approach efectively amplifed the positive plays of a specifc team while simultaneously diminishing their negative plays.From our quantitative evaluations, we found that biased highlights were perceived as more comprehensive and interesting when the viewer's favored team won.Notably, the efects of bias could only be identifed if they were coupled with game outcomes (win/loss) and not just the directions or strengths of the bias.
In the context of sports, bias is an unavoidable factor.Mostly, a game is a competition among teams, and only one team usually ends up winning.Therefore, it might not be plausible to generate a completely unbiased and balanced highlight with the result of the game already biased toward one team.Even if we could technically generate highlights that show the positive plays of both teams equally, that would not ft into the fundamental reason why people love watching sports.Further, the highlights made in such a way may not refect the fow of the game, so we did not consider conditions that sought arithmetic fairness.Moreover, in the qualitative fndings, we found that the concept of fairness in sports highlights heavily difered among individuals.This nuanced understanding of fairness complements our quantitative fndings, which did not show statistically signifcant diferences in perceived fairness across bias conditions.
The concept of fairness is highly context-dependent, according to many studies [46,49].Our results showed that the most important context in terms of sports highlights was whether the favored team won or lost.Therefore, depending on whether one's supporting team wins or loses, various highlight generation criteria could be considered.For example, when it comes to winning games, participants enjoyed both the highlights focusing on their favorite team's positive or negative plays.This suggested that participants expected to show the winning game from various angles.
However, when one's favorite team loses, viewers' moods will typically decline, so it would be crucial to be cautious when it comes to injecting bias.Our fndings indicated that highlights with a strong bias in games where participants' favorite team lost were perceived as less comprehensive and less interesting.Therefore, it might be helpful to ofer highlights from the most neutral perspective in those scenarios of defeat.A potential adjustment might be to, only upon request, generate highlights emphasizing the positive plays of the viewer's favorite team.In doing so, it might be possible to feature moments where their favorite team exhibits commendable efort but narrowly misses winning opportunities.In doing so, their emphasis could shift to the narrative: "We lost but put up a good fght."

Future of Automated Sports Highlight
Although they were not satisfed with the automated highlight that is currently being serviced, participants still had various expectations for automated highlights, especially in terms of personalization.First, there were many demands for refecting their preference for a team or a player.In addition, some expected to choose highlights focusing on ofense or defense.Furthermore, other participants suggested making highlights of winning games from various angles and highlights of losing games less unpleasant.
Also, there were various demands in terms of the structure of automated highlights.Primarily, participants thought the sports highlight should be structured like the conventional highlight made by humans, such as including a lineup, an opening run, and a last batting.These structural elements can be easily extracted and integrated into future generation systems.Furthermore, there was a request to enhance the feeling of presence.Several participants suggested including elements that could feel the atmosphere of the stadium, such as the reaction of the audience and the coach's expression.These elements might be difcult to detect with WPAbased methods, so combining them with existing scene detection techniques could be a promising direction to improve our methods.

FUTURE WORK AND LIMITATIONS
We have identifed several limitations of our study, which not only provide insights into areas of improvement but also indicate the directions for future research.
Firstly, our method, tailored for baseball, may encounter challenges in sports where quantifying the likelihood of winning is more complex.Although there have been eforts to calculate win probability in various sports and esports, these may not be as rigorously validated or as precise as in baseball.
Secondly, our study did not extensively explore fne-tuning the bias ( ) or examine its efect based on the game outcome, which would be crucial for user perception.Therefore, a more thorough investigation into the bias injection framework, particularly focusing on refning the bias ( ) and comprehending its impact, will be essential.Additionally, conducting a detailed analysis of selected games, including the trends of WE or WPA, could enhance our understanding of how game dynamics infuence highlight generation.
Another limitation is that while we emphasize the signifcance of personalized highlights, the design of the current study did not permit participants to choose their preferred method of generating highlights.Considering that individuals may have diverse preferences and evaluation criteria for sports highlights, further research that focuses on accommodating personal preferences would be crucial for the automatic generation of personalized sports highlights.
Additionally, our research did not objectively measure the level of participants' fan allegiance.Future studies should aim to objectively measure participants' degree of fan allegiance, potentially through surveys or behavioral analysis, to gain a clearer understanding of its infuence on highlight evaluations.Moreover, we did not consider the possibility of participants having multiple favorite teams.Addressing this in future research could lead to more nuanced insights into fan behavior and preferences.
Lastly, our study focused on a small and localized participant group from South Korea.For the fndings to be relevant to a broader and more diverse audience, it would be crucial to conduct further research that includes participants from a wider range of demographics.

CONCLUSION
This study introduced a novel method for generating sports highlights based on win probability.Specifcally, we utilized Win Probability Added (WPA), a statistical metric rooted in historical baseball data, to identify pivotal moments and generate baseball highlights that better capture the fow of the game.Additionally, we introduced a bias injection framework using weighted WPA to generate biased highlights.To evaluate our methods, we generated baseball highlights using both standard WPA-based and bias injection methods and conducted a mixed-method user study.The results of the evaluation study revealed that participants evaluated WPA-based highlights as more comprehensive and interesting compared to existing AI solutions, primarily because WPA-based highlights better captured the game's fow.In addition, our fndings suggested that the injection of bias needs to be carefully implemented, especially concerning the outcome (win/loss) of the viewer's favorite team's game.Lastly, we found that participants had diverse preferences and expectations for auto-generated highlights.Based on these fndings, we provided design implications for automated systems that generate sports highlights.We hope our work to inform the development of personalized sports highlight generation methods, thereby enhancing viewer experience and engagement.

Figure 2 :
Figure 2: An example case of the same event (home team home run) resulting in diferent WPAs

Figure 3 :
Figure 3: An example of the graph of win expectancy(WE) and selected events

Figure 4 :
Figure 4: Average Positive Event Ratio and Average Time Proportion of Winning Team and Losing Team.

Table 2 :
Algorithm of Bias Injection Method

Table 3 :
Factor analysis of the sixteen survey items

Table 4 :
Efects of Bias Direction, Bias Strength, and Win/Loss on Biased Highlight Video Evaluations