HEARD-LE: An Intelligent Conversational Interface for Wordle

This study introduces an innovative approach to design an interactive audio-based platform for the popular word game, Wordle. Unlike conventional word games that heavily rely on visual elements, there is a lack of alternatives in audio-based formats, leading to limited accessibility and engagement for a broader audience. To address this gap, we propose an audio-based game that transforms the traditional Wordle interface into a conversational format, exploring the possibilities of purely audio-based interactions. To justify design decisions for the proposed interface, we conducted a design study focusing on interaction cues between two players engaged in the Wordle game through conversation. Based on the insights obtained, we implemented the audio-based system. Subsequently, a user study was conducted to compare different input types (e.g., letter-based partitions, entire words) and evaluate the integration of an interactive question-based interface. Four distinct game variations were created for systematic evaluation, measuring mental/physical workload, enjoyability, and system intuitiveness. This research contributes to the development of inclusive and engaging audio-based word games, while providing insights into the benefits and challenges of incorporating interactive interfaces. It paves the way for a more accessible gaming experience and promotes the adoption of audio-based gaming interfaces in the broader gaming community.


INTRODUCTION
Word games have long captivated players with intellectual challenges and strategic thinking [1,31].Traditionally, many games heavily rely on visual cues and textual representations [25], which can unintentionally exclude individuals with visual impairments [18,24] or those who prefer auditory stimuli.This limitation in accessibility calls for innovative approaches to transform visual-dependent word games into audio-based experiences.However, few solutions exist that aim to address this problem.We aim to bridge the gap between visual and auditory gaming experiences, creating a more immersive environment for players.The objective of this research paper is to explore the best approaches for converting visual-dependent word games into audio-based formats, with a particular focus on minimizing cognitive workload while maintaining game play engagement and accuracy.We examine diferent mechanisms of word input and the incorporation of interactive functionalities to design a conversational interface that achieves these goals.This involves creating efective audio cues, clear instructions, and streamlined interactions to enhance user experience.We created a preliminary study to analyze two factors: the presence of interactive functions, and diferent forms of verbal input.
By assessing the impact of these elements through a Latin square design, we aim to identify the factors that minimize cognitive workload, sustain players' engagement, and preserve accuracy.The fndings from this study can ultimately contribute to advancing inclusive game design principles of audio-based word games and the creation of a conversational interface.
This research constitutes a preliminary study, representing an initial foray into converting visual-dependent word games into audio-based formats.As a novel and emerging area of inquiry, this study aims to provide valuable insights and establish a foundation for future investigations in this domain.
By coding and testing these game prototypes, we aim to gather empirical data through usability testing, user feedback, and performance analysis.Specifcally, we aim to answer the following research questions: (1) To what extent can visually dependent word games such as Wordle [32] be translated to an audio-based conversational interface?(2) How can the game be developed to minimize cognitive workload while maintaining game play engagement and accuracy?

RELATED WORK
Previous works have existed that attempt to play diferent versions of games and evaluate cognitive workload, performance, and other related metrics.Speech-based interfaces are built into many everyday items such as cell phones [8], computers, and vehicle software [12] to control and navigate through diferent tasks.Previous literature has explored the ability of speech-based interfaces to communicate and organize information [14].However, previous work has also highlighted how speech-based interfaces open space for ambiguity [6].
In recent years, research eforts have focused on the implementation and impact of conversational interfaces and audio-based interfaces to enhance accessibility for visually impaired individuals [9,19].Numerous studies have explored the efectiveness of these interfaces in supporting visually impaired users with daily tasks [3], communication [11,16], entertainment [7], and information retrieval [26,27,33].The interactions ofered by these interfaces through natural language processing and speech recognition technologies have been particularly lauded for their user-friendly and intuitive nature, enabling blind individuals to navigate various digital platforms [36,37].Previous literature has also emphasized how conversational interfaces can aid with temporary limitations, for example when users are operating vehicles [2].However, the current literature lacks the adaptation of visual-dependent word games to conversational interfaces.
Earlier research has been conducted to measure various forms of cognition across multiple games each difering in features and rules [20,34].Researchers have also used diferent in-game features to measure cognitive load [23].Cognitive workload, often evaluated through subjective rating scales [28] or physiological measures like heart rate [10,15,17] and pupil dilation [35], provides valuable insights into players' mental efort [29], attention allocation [22], and information processing [21].Such research endeavors have ofered a comprehensive understanding of players' cognitive and perceptual capabilities during gaming [5,13,30], facilitating the development of more enjoyable experiences.Nevertheless, a gap remains in measuring cognition in audio-based word games.

DESIGN
To discover intuitive interactive cues between the user and interface when playing the game, we frst developed a design study that explores interaction cues between two people playing Wordle through conversation.The goal of the design study is to discover insights from this conversational Wordle game that will justify the design decisions of the proposed interface's input commands and output responses.
In the design for our interface study, we recruited a total of 6 participants-3 females, ages 20 to 40 years-through convenience sampling in a large public university setting.Before performing the task, we instructed the participants to complete a pre-study questionnaire, which consisted of the following demographic information inquiries and fve-scale Likert-based questions: From the pre-study questionnaire, 6 participants self-reported 6 responses for each given question.Overall, 100% of the participants stated that they were familiar with Wordle and other word games.However, none of them had played audio-based games in the past.
Each design study activity consisted of pairs of participants playing respective roles, where each participant was only allowed to take part in the activity once and in only one role.The expected outcome of this activity was to gain insight of diferent insights and methods of how people played word games and convert those into in-game features.Observing how participants interacted with the audio-based Wordle and utilized the question-asking option, we meticulously integrated their feedback and suggestions into the game's mechanics.Restrictions placed consisted of (1) Little to no intervention from the study conductor.(2) Participant 1 serving as the guesser is allowed to ask any questions that did not include defnitions, visuals, or hints.(3) Participant 2 serving as the word game is told the word ahead of time and is expected to answer the questions that Participant 1 asks.
Additionally, we were also able to obtain more information we were not able to include due to limitations, but serve as future implementations, for example, origins and parts of speech of words.
The activity consisted of two roles.The player communicated their Wordle guesses and asked clarifcation questions.The moderator received and reviewed Wordle solutions, provided feedback on guesses, responded to clarifcation questions, and gave the fnal solution.
We assigned a study conductor to observe the participants' activity.Their was was to describe the Wordle game rules and assign participant roles.They also provided the moderator with the Wordle answers, instructed participants on how to run the activity, and intervened when participants strayed from the activity.
While the participants were engaged in the activity, the assigned study conductor observed the participants by audio-recording the conversation, as well as observing and taking supplemental notes of their interactions.After all participants completed their activity, the study conductors transcribed the players' questions and assigned a separate researcher to conduct a thematic analysis of the transcriptions.The outcomes led to a total of 56 codes, which were then grouped into 10 derived categories and 5 fve emergent themes: Four of the fve emergent themes served as insights for designing the functionality for our proposed interface: Letter Check (T1), Position Check (T2), Specifc Inquiries (T3), and Word Guess (T4).

IMPLEMENTATION
For this work, we implement our proposed interface called HEARD-LE as an audio-based version of Wordle.The game was adapted from the original Wordle game and programmed in Python, and introduced audio-based statements and spoken input for a conversational interactive platform Our prior design study served to justify our design decisions for the human player (i.e., the player)'s interaction with the computer moderator (i.e., the moderator), specifcally from the emergent themes.
The emergent themes led to developing the diferent features in our interface, consisting of the game interaction and the guess recognition.For the frst component of our interface, we introduce the game to the player through interactions with the moderator.
To process the player's spoken guesses and inquiries, we used the Google Speech Recognition API for its ability to process statements without any time limitations, since we observed that durations between the player speaking was highly variable (i.e, between 10-60 seconds).In addition, we also appended arrays in these games to store previous information, including letters that had been used, positions that they had appeared in, previous guesses, and the state of characters, as in if they had been in the correct place and in the word, wrong place and in the word, or not in the word entirely.In our implementation, diferent methods were called in relation to each question being asked by the player.The integration of questions also required a section of our implementation to interpret and understand what the player was asking for, and then provide an appropriate answer.

RESULTS
This research aims to evaluate the efectiveness of various factors infuencing cognitive workload, engagement, and accuracy in the context of gaming.To achieve this objective, we conducted a comparative analysis of four distinct games and carefully analyzed the feedback provided by players.The interface allows participants to involve users in speaking their responses and other commands to play speech-based audio versions of Wordle.Using the information learned from the design study, we were able to incorporate commonly asked questions and decide on the aspects of the program we would evaluate.To understand the efectiveness of features in the game, we developed a Latin square method, which allows control for confounding variables and reduces bias in our experimental study.The observed factors are as follows: letter-based partitions versus entire word-based input and the ability to access a base of previous information through questions.The combination of these games was created, giving us a total of four diferent games to test these factors on.Each combination was named the following: • Game A: Letter-partitioned input with no questions • Game B: Entire word input with no questions • Game C: Letter-partitioned input with questions • Game D: Entire word input with questions We picked these factors because the evaluation of letter partitions versus whole word input would reveal natural limitations for audiobased word games, through issues like homophones, mispronunciations, and so on.Additionally, the inclusion of an interactive question-based interface was chosen as we believed it would allow the user to store less information in their memory and therefore possibly impact cognitive workload.In the design for our interface study, we recruited a total of 5 participants-3 females, ages 20 to 40 years-through convenience sampling in a large public university setting.The interface study activity consisted of one participant and one study conductor.The expected outcome of this activity was to gain insight on the ability for each factor of the game to reduce cognitive workload while observing aspects like enjoyment, cognitive workload, and the cumbersomeness of games.Restrictions placed consisted of: (1) little to no intervention from the study conductor, and (2) users were only allowed to ask questions that were implemented within the game.Through our interface study, we were able to obtain both quantitative and qualitative feedback through surveys and interviews.The post-study questionnaire was given after the experiment, and consisted of the following Likertbased questions (1 is strongly disagree and 5 is strongly agree): easy, potential use convenient, confdent level, and reverse learning curve.The interview procedure was also given after the experiment.
There were three roles in the activity.The player communicated Wordle guesses, asked clarifcation questions, and shared their experiences.The moderator explained the game rules and administered feedback measures.The program receives and processes Wordle solutions, provides correctness feedback, responds to clarifcation questions, and gives the fnal solution.The study moderator frst requested consent, and then provided instructions to participants such as role responsibilities and game description.After participants practiced with the original Wordle [32] game for the core game's familiarity, they then played the diferent HEARD-LE versions and flled out corresponding NASA-TLX forms.Finally, participants flled out a post-study questionnaire and were verbally interviewed.

Quantitative Analysis
Given the information from the user studies, we wanted to compare the efectiveness of each individual factor on the metrics measured in our surveys.Therefore, we calculated the signifcance of each metric separately to examine if individual factors had an efect on our individual given questions.Overall, we computed t-tests to evaluate the efectiveness of our measured results.The tests encompassed 5 Likert-based questions and 6 NASA-TLX metrics, with the process conducted twice to thoroughly examine both factors in our games.To identify signifcant diferences between these factors, we utilized t-tests.The results are illustrated in Figure 1.The following signifcant data points were found when the signifcance threshold was set at 0.05: (1) Through the comparison of games with questions and games without questions on the NASA-TLX form, the results (p=.048) indicate that questions are associated with increased levels of efort compared to games without questions.
(2) The study examined the user's perceived frustration during game play based on the NASA-TLX results.Through the comparison of games with questions and games without questions, we found a statistically signifcant diference (P=.026).Specifcally, games with questions were found to induce higher levels of frustration compared to those without questions.
(3) Accessing previous information through questions was less cumbersome, indicating a statistically signifcant diference (P=.039) between games with questions and games without questions.(4) Spelling out the entire word required a lower level of efort, indicating a statistically signifcant diference (P=.047) between spelling and whole word based input.

Qualitative Analysis
The participants from the preliminary study provided thorough feedback about how each system behaved.From the pre-study, we observed that 80% of users had played word games before, however none of the participants played audio games in the past.Overall, they thought the general concept was intuitive and enjoyable.Users also overall enjoyed the option of having questions as it allowed for a reduction of memorizing diferent aspects of the state of the game: "For the versions where I wasn't allowed to ask questions, I was like, oh great if I can't remember something than the information is gone  forever." When asked about which feature they liked about the game, 100% of users stated having the option of asking questions.Through the interviews, some other possible questions were brought up, such as asking for previous letters placed in certain positions.We were also able to fnd that when asked to rank the diferent games on enjoyability, the 60% of participants ranked Game C as the highest option.Conversely, users rated Game B the lowest ranking 40% of the time.Many users also expressed frustration for whole wordbased input as the API would sometimes hear words incorrectly.This happened three times in total for all user studies, where the API heard "pearl" as "pearly," "consonants" as "continents," and "yes" as "Jess." During the study, 60% of the participants also stated without prompt how they believed that an audio-based game was more difcult than a visual interface.Nevertheless, the success rate for guessing words was fairly high, with a 80% rate for guessing the word correctly per round.We note that this number is still low compared to the 88% success rate in a randomized control study [4].This success rate drop is expected due to the challenges that an audio-based game would create.

DISCUSSION AND FUTURE WORK
Although having the option for in-game questions can increase efort necessary and frustration the benefts of the assistance the questions provide could potentially outweigh the harms.Many users also expressed a disliking for entire word inputs due to pronunciation errors or confusion caused by homophones in verbal interviews.This is supported by the signifcance found in the t-tests, where spelling-based input required less efort than entire word inputs.Users also emphasized criticisms relating to the intuitiveness of questions.For example, two participants used questions that behaved in diferent ways than they expected, which increased confusion and the frustration of game play.This issue can be addressed with more intuitive wording of questions in later iterations of game development.We also observed that although questions decreased the need for memorizing board positions, including questions also increased another factor that the player had to think about.This overall increased the frustration and cumbersomeness of an interface that had questions.This issue can be addressed by experimenting with useful prompts, where the program would say useful information specifc to the user, erasing the need for the user to directly ask for information.The biggest limitation of our work is that because this is a preliminary study, we were only able to get 5 participants to perform the interface study.Future work should evaluate more users and also look into diferent factors to control, for example interactive hints and prompts, time constraints, and using diferent sound cues to convey information.There were also some limitations related to the speech recognition interface that needs to be addressed in future iterations.For example, some words were heard incorrectly when playing the games.This issue can be addressed by allowing all games to have an alternative spelling option to correct the computer when homophones or pronunciation errors are encountered.
Our goal was to develop a system that allows users to play a visual-based word game while only conveying information from an auditory method.To achieve this, we investigated the design of an audio-based Wordle game with diferent factors: Letter-based partitions versus entire word-based input and the ability to access a base of previous information through questions.Preliminary results concerning the participants' mental workload and feedback about the design indicated that letter-based partitions were a more (a) NASA-TLX Results (Lower is better) for games with and without available questions.(b) NASA-TLX Results (Lower is better) for games with Spelling vs. Word Inputs.(c) Post-Survey Results (Higher is better) for games with and without available questions.(d) Post-Survey Results (Higher is better) for games with Spelling vs. Word Inputs.