Discovering Undiscovered States in Human Robot Verbal Interaction

Despite the abilities of automatic speech recognition systems such as CMU Sphinx, Google Speech-to-Text API, and Amazon Tran- scribe to recognize a variety of voices, they often face challenges in accurately processing complete information. To overcome this limitation, we propose a novel approach utilizing Markov Decision Processes. Our research involves an intelligent agent that evaluates human speech (n=1) and identifies new states through learning, enabling it to process more comprehensive information compared to traditional systems. The paper illustrates two scenarios : one where the intelligent agent explores by detecting undiscovered states and ultimately reaches the goal state, and another where while discovering new states it also revisits the previous states.


INTRODUCTION
When humans engage with an intelligent agent, the agent is required to efectively process all incoming information.Nevertheless, in speech recognition systems, the agent frequently encounters challenges in processing information due to its inherent limitations.This difculty arises from issues such as difculties in identifying specifc words, variations in pronunciation across diferent cultures, and constraints related to processing lengthy instructions.As a result, the improvement of speech recognition technology is crucial to ensure accurate and efcient communication between humans and intelligent agents.
Multiple methods are employed in machine learning (such as Hmm [7]) and deep learning (like RNN, CNN [1,13]) to enhance the performance of an agent handling verbal information.In addition, semantic labels [8,9,12] are identifed during parsing to help the agent gain a deeper understanding of its environment.In HMM, RNN, and CNN approaches, the agent has a comprehensive understanding of the state space.Therefore, only brief instructions are provided to give the agent an overview of the entire state space.Providing lengthy instructions would lead to a larger state space, making it more challenging for agents to process.Speech engines like Amazon Alexa or Echo, CMU sphinx that process human verbal instruction as accuracy less than 70 percent [5].Hence most of the times they fail to process human instructions longer than certain number of words because the states are not identifed properly .
In our research, we developed a new algorithm that utilizes Markov Decision Processes for improving speech recognition through state discovery.Integrating Markov Decision Processes [10], into speech recognition systems can signifcantly improve the agent's ability to flter and process key information from human speech, ultimately enhancing the overall performance and efectiveness of intelligent agents in various tasks domains and humans donot have to instruct the agent multiple times.For example, in a vacuum cleaning robot context, human could given an instruction like "Robot, your task is to clean the entire house.Start by vacuuming the living room, then move to the kitchen to do the dishes.After that, sweep and mop the foors in the hallway and bedrooms.Once you've completed the cleaning tasks, return to your charging station." -This instruction seems to be too large and complex for the current AI systems to handle.Hence, a new algorithm has been designed to solve this problem.This algorithm not only discovers states but also parses speech in a way that minimizes the state and action space, which should make it easier to process.To address this problem, we designed a novel algorithm that will not only discover states but also parse the speech in a way to minimize state and action space which might be easier to process.
In this paper, we applied our algorithm to a robot navigation problem where the robot sequentially discovers states after parsing human commands.Unlike previous work using MDPs [3,4,6,11] that fail due to large state space models, our approach allows for full processing of incoming human information as the robot discovers states without relying on traditional rewardbased systems.Additionally, we integrated our system with Robot Operating System (ROS) and Gazebo in order to assess its efectiveness in a simulated environment.
The above algorithm dynamically discovers and explores the new state and update the overall problem space.To avoid the problem of large state space, our algorithm couples action into one.After parsing if action Right is followed by Forward then instead of associating Right and Forward with two sequential states, we couple Right with Forward and associate one state with the action.Similar approach is taken for Left with Forward.For Forward action no such coupling of actions are needed.This difers from traditional search algorithms like Breadth frst Search and Depth First Search approaches which only involves for static problem space.Being in the early design state, we associated forward with specifc distance unit.

Experimental Approach
We implemented the algorithm (using Python)using the Gazebo Simulation (n=1) in a simulated Turtlebot4 environment.In this experiment, we initially used text to verify the efectiveness of the algorithm instead of using any speech engines.The Python NLP packages [2] are used for the parsing and then the action verbs are fltered out.The robot then discovered and learnt about the states corresponding to the actions and processed the entire text instruction.
In our experiment we used only three actions Right,Left and Forward, if the robot needs to go back in its path, the robot is encouraged to turn using Right or Left (for 180 degrees).In this experiment we used couple of test cases to verify the efectiveness of our algorithm on Gazebo:

Case I
We initiated the evaluation of the algorithm's efciency using a lengthy statement such as Turtlebot is ready to start from X. Turn right and go forward.Then turn left and go forward.Turn right and go forward.Turn right and go forward.You will reach your destination, Y..In this text the start state is 'X ' and end state is 'Y ).In this algorithm is assumed that the state space will be explored sequentially.After parsing the actions right, left, forward.Initially the Turtlebot was facing upwards.Hence the instruction is processed from the current orientation of the Turtlebot.After processing the frst action "Right", the robot discovers a state and then it processes the next action "Forward".Since Right is followed by Forward we will only consider the state after forward and mark it as visited.We know that in MDPs the state-action pair is explored in 0 , 0 , 1 sequential fashion, in our case 0 = + or + .Here we initiated the algorithm with the start and the end state information to simplify our process, so the last state discovered is overwritten with Y.In the context of conventional Markov Decision Processes (MDPs), after identifying the state space, we establish precise transition probabilities for transitions originating from the initial state.However, in our navigational model, which is designed to be solved sequentially, we are faced with the prospect of taking either correct or incorrect actions.

Case II
In the second case we used the same text, but at the end we asked the Turtlebot to return to its origin using the same path.This time the Turtlebot only revisited the old discovered states and no new information is acquired.

Fig 1 depicts
the state transition model using MDP while discovering the state for the text "Turtlebot is ready to start from X. Turn right and go forward.Then turn left and go forward.Turn right and go forward.Turn right and go forward.You will reach your destination, Y".Our algorithm depicted that we have 4 states to be discovered including the goal state.We used information gain to understand if the robot is gaining any new form of information during this process.During the information gain measurement the current number of states are considered at each step.We found in the discovery of the frst couple states the IG is around 0.087 for Case I. Whereas for Case II, IG is 0.00 as no new information is gained as the robot is revisiting the previous states only.

CONCLUSION
In this experiment we proposed a novel algorithm to discover the states from human speech stream.Our algorithm suggestively gains information for each new state discovery.In future we are planning to improve this algorithm to have feature understanding of the newly discovered state space, apply on simulated and real robots platforms for future use in diferent real world domains.

Figure 1 :
Figure 1: State Discovery using MDP Algorithm 1 Algorithm for discovering statesParse the human Speech and extract the action verbs and nouns