Human Robot Interaction through an Ontology-based Dialogue Engine

This paper outlines the evolution of the Sugar, Salt, & Pepper project for high level functioning children affected by autism, focusing on the development of a dialogue system that relies on an ontology-based knowledge base. The ontology offers a formal representation of knowledge and interrelationships within the movie domain. The dialogue system addresses issues related to predefined answers, emphasizing adaptability for multi-platform use, particularly in the context of the social robot Pepper. The research covers detailed phases of construction and development, highlighting implementation choices and challenges faced.


INTRODUCTION
Ontologies play a crucial role in human-robot interaction by providing a clear framework for defning context [15,18], user preferences, robot capabilities, and communication modes.They may address challenges related to language and context diversity, fostering a shared framework for information exchange.The integration of ontologies in human-robot interaction has led to signifcant advancements in healthcare, domestic robotics, and manufacturing, enhancing the efciency and intuitiveness of interactions [13].
Sugar, Salt & Pepper [7,8] is a collaborative project focusing on a living laboratory integrating skills related to educational and social robotics [9] to address the specifc needs of highly functioning (Asperger) children with autism.The laboratory, designed and led by a multidisciplinary team, involved children between the ages of 11 and 13, who participated in the lab sessions once a week for four months in which exchanges and their interactions with the robot Pepper were tested, with a focus on language, communication, emotions, and the enhancement of social skills.An additional goal was to provide young participants with a space to increase their communication and socialization skills with each other and strengthen the acquisition of strategies and autonomy related to daily activities, such as preparing a snack, with the help of Pepper, confgured as a highly motivating and engaging tool.After conducting a set of lab sessions, children were asked to complete a questionnaire (Attribution of Mental State [14]) to assess the mental states attributed to the robot.The results indicated a generally low score, emphasizing the necessity for improved conversational strategies to navigate the complexities of social interaction.The identifed issues with Pepper's dialogue system included: Predefned Dialogue Strategies: The existing system predetermined possible questions and answers, limiting fexibility during interactions; Robustness and Accuracy: Pepper struggled to consistently handle and respond to questions outside the usual topics covered.
The study concluded that refning dialogue strategies and enriching Pepper's knowledge base was crucial.Specifcally, integrating a knowledge base in the form of an ontology was proposed.This ontology, due to its structured information management properties, could enhance the robot's ability to navigate and reason about various topics, improving conversation management and understanding of user input.This paper introduced the development of a knowledge base (KB) in the form of an ontology.This integration aimed to elevate the robot's intelligence and reasoning abilities, enabling adaptation to user needs and personalized interactions.Unlike dialogue systems that are purely based on a stochastic approach (e.g., ChatGPT), systems based on ontologies [16] are better suited for educational and therapeutic applications, since they provide controlled and transparent outputs.The project's goal was to defne an ontology in the movie domain, which was one of the conversation topics proposed by the robot during the social moments of the laboratory, emphasizing the importance of avoiding solely predefned responses for more efective and engaging conversations.

APPROACH
The goal of the project concerned the defnition of a knowledge domain, structured in the form of an ontology, to enrich the knowledge base of the Pepper robot, in order to make it able to establish a conversation with the human interlocutor that would avoid the robot responding solely by means of predefned and static sentences, thus making the conversation more efective and engaging.To accomplish this goal, we identifed the ontological domain related to the movie feld.The project consists of four main phases: i) Ontology modelling, ii) Knowledge retrieval, iii) Dialog engine, and iv) Rest API services.In the following we present in details the description of these steps.

Ontology modelling
This project planned to start from a pre-existing formal ontology, so we started from the ontology mentioned in Gena et al. [6] and reworked it according to what was needed for our project.In the ontology defnition phase, the primary goal was to create a structured representation of the movie domain so that the robot could understand and interact with relevant concepts and information.The ontology was conceived as a set of classes, properties, and relationships that refect the conceptual structure of the domain.Initially, an in-depth analysis of the domain was conducted to identify the main relevant categories and concepts in order to establish a hierarchical approach for structuring the classes.The main goal was to capture the complexity of the movie domain through a hierarchy of classes that refects the nature of the relationships between key elements.Below is an example ( To ensure a complete and accurate representation, attributes were also included in the classes to capture specifc details.This process was designed to signifcantly enhance the completeness and coherence of the ontology, allowing for a more accurate and detailed representation of the movie domain.The ontology alignment operation proved to be of signifcant importance in the development of the ontology, as it allowed the data in the main structure to be enriched with information from established sources such as Wikidata [3] and DBpedia [1] resources widely used to enrich ontologies and knowledge bases with data from the semantic web. The alignment operation was carried out by consistently associating the classes and properties in the ontology with the respective entities and properties in the external sources, using the IRI (Internationalised Resource Identifer) as linking keys.Through these operations we were able to work on a structured and solid ontology through which we could implement the code for populating it and its subsequent use.

Knowledge Retrieval
This phase describes the tasks conducted from data extraction to the subsequent population of the ontology, in order to provide the robot with a knowledge base.The entire component was programmed in Python language, as it is widely used and supported for tasks of this application nature.
A list of the relevant steps in the development process is described in the following: Creation of relational database.We designed a relational database using MySQL, with tables dedicated to specifc entities.Tables were designed with primary keys and relationships, providing a solid structure for the integration with the ontology.Data entry.We automated the process of entering data into the database using predefned SQL queries, in Python.The code was structured to handle large amounts of data efciently, inserting over 2,700,000 records, of which about 300,000 were related to movies.
Mapping.We used the Ontop plug-in for Protègè [4] to map ontology classes, properties and relationships to relational database tables.The mapping was essential to link the extracted data with the semantic structure of the ontology.
Integration on GraphDB.Leveraging on the above mapping, we integrated the ontology with the relational database on GraphDB, enabling fast and accurate access to information.These steps made possible to transform data from external sources, often complex and heterogeneous, into a structured and coherent KB, ensuring consistency and integrity of the extracted data.

Issues.
During development, for the Knowledge Retrieval module, we faced several difculties, including managing the variable quality of data from Wikidata and DBpedia, and dealing with challenges related to variable information density.These challenges were addressed by implementing scalable strategies, splitting complex operations into batches, and addressing specifc problems such as extracting labels from Wikidata and handling limitations in DBpedia results.In addition, we optimised code performance through caching and intermediate data storage strategies.On the other hand, in the development of the dialogue engine, several challenges emerged.In particular, Named Entity Recognition (NER), for identifying key elements in users' sentences, showed limitations with the Italian language having been trained with less data than in other languages (such as English).In addition, the complexity of entities (e.g., compound names of movies) made the task more complex.Consequently, the adoption of strategies such as recognizing grammatical dependencies and sentence cleaning improved the process, enabling more accurate SPARQL queries.Another issue involved the development of a custom sentiment analysis model, which proved challenging.An approach using a Multinomial Naive Bayes classifer and TF-IDF was inconsistent, and the scarcity of labeled data in Italian posed challenges in creating a precise model.Better results were achieved with the use of the BERT-based model [10], despite the complexity of linguistic nuances and the lack of specifc data for that language.

Dialog Engine
This development phase forms the heart of the conversation between the user and the robot.The main goal was to avoid the classical question-answer predefned scheme, trying instead to make the conversation more dynamic and robot-centered, to prevent the user from asking things the robot could not answer.The entire module is written in Python and AIML (Artifcial Intelligence Markup Language)1 .After an initial testing phase, it emerged that the most promising approach was to split the conversation into diferent components, characterised by diferent specifc structures and machine learning models.

Dialogue phases.
The conversation was then divided into the following components, following this order: (1) The AIML module for the initial stage, wherein the user interacts with the robot through the AIML language, asking questions that cover the initial stages of a conversation, facilitating the initial interaction and helping to establish a stronger connection with the user.Furthermore, the robot captures relevant information (as for instance demographic data) during this interaction, which will be used in the later stages of the dialogue; (2) The Profling Module phase is dedicated to the user profling on the movie domain.At frst, a genre prediction model is used to predict the user's most likely preferred genre (based on the information gathered from the previous module), then information about both user and robot preferences (such as favorite movie, character, etc.) is exchanged, which is used to adapt the system's responses to the user's preferences, creating a more personalised dialogue experience; (3) In the Question Answering Module, the robot asks the user questions, based on previously shared preferences (as for instance based on the user's favorite movie, the robot may ask what the director of that flm is).The user and the robot take turns asking and answering questions.This approach adds entertainment to the dialogue and stimulates the active interest of the user; (4) Survey Form concludes the dialogue phase.The robot asks questions about the interaction that took place (for instance, about the degree of satisfaction with how the robot understood the answers).The user responds verbally, by providing a numerical rating (on a Likert scale of 1 to 5) of the overall experience with the robot.

AI models integration.
The AI models were defned with the aim of creating a natural, engaging and personalised conversational environment, making the interaction more intelligent in understanding and responding to the user's needs.The models implemented are: Genre prediction model, used in the profling module, designed to predict the most probable user's preferred movie genre, based on gender and user's age (asked in the AIML module at 2.3).It uses a decision tree-based algorithm trained on the MovieLens 20M [2] dataset, underwent several preprocessing stages initially to flter the necessary data and transform it into a format suitable for the project's needs.Model training resulted in an accuracy of 87%, with an overall Gini index of decision tree nodes equal to 0.19; Quiz model, which uses a neural network with, (1) an input layer, a bag of words vector related to the user's query obtained through a natural language processing approach (NER Tagging), (2) two hidden layers with ReLU activation ReLU() = max(0, ) and (3) an output layer for the prediction of the relevant SPARQL query.The model is trained on a dataset containing intents, which constitute the diferent topics or themes related to the domain queries, and provides answers by extracting the data via SPARQL queries to the ontology; Sentiment Analysis Model: In order to guarantee a more accurate result, we used a BERT-based model specifc to the Italian language.Sentiment analysis gained crucial importance, as it became essential to understand whether each user sentence expressed a positive or negative intent.
Thanks to this implementation and the models used, it was possible to create a dynamic conversation while maintaining the robotbased orientation.In other words, the robot guides the conversation towards the domain of its knowledge, but without resorting to predefned and static responses, making the interaction more fuid and adaptable to the user's needs.

Rest API services
In order to make the dialogue engine scalable and shareable across diferent platforms, we defned Rest API services to handle user and robot input and output directly and synchronously.In detail, the user's input is processed by Pepper, then sent via a POST request to the dialogue engine which will send the output to the dedicated endpoint so that Pepper will read it and return it to the user.Through this implementation, we were able to structure a communication management system that is efcient and synchronised, but above all usable on any device via API requests.

Test and Results
The main idea underlying this preliminary testing was geared towards evaluating the efectiveness of the used strategy.In addition, the experimentation focused on diferent demographic parameters of the users and the diferences in the evaluations between the interaction modes.The design of the data collection involved the integrated survey module at the end of each interaction (see Survey form in 2.3), with methodically and hierarchically structured questions.The participants, 52 neuro-typical users (75% male and 25% female), provided numerical ratings on a Likert scale of 1 to 5, expressing detailed opinions on various stages of the interaction, which, due to space constraints, are not reported in details.The results of the evaluations were analysed in detail, revealing interesting trends.The questions about the robot's understanding of the users' preferences received positive average ratings (AV: 3.05, SD: 1.10), with moderate variability.Similarly, the questions related to the interactions showed positive average ratings (AV: 3.13, SD: 1.14), indicating satisfactory user experiences.However, the question on the understanding of user questions showed a lower distribution, indicating difculties in the process of translating the questions into SPARQL language (AV: 2.33, SD: 1.10).This suggests a specifc area that may require further refnement in the interaction process (as also described in [5]).The overall evaluation of the entire dialogue received positive average ratings (AV: 3.25, SD: 0.93), with a more concentrated distribution between moderate and high ratings.The lack of extremely low ratings suggests that, despite some critical issues, most users had an overall satisfactory experience.Further analysis was conducted on subgroups of users according to age, gender, educational qualifcation, and type of interaction.Although some diferences emerged, these were not found to be statistically signifcant, indicating an overall similar evaluation between the diferent groups.

CONCLUSION AND FUTURE WORK
Despite some identifed challenges, the project demonstrated positive potential in HRI, and the collected preliminary evaluations provided a solid basis for future optimisations.We have identifed a number of specifc challenges that we will address in future work.Analysis of the questionnaire data revealed a wide range of user opinions at diferent stages of interaction with the robot.The evaluations varied signifcantly, with some categories eliciting strongly contrasting reactions.This diversity of opinions provides ground for future investigations to understand the reasons behind these evaluations and to improve users' experiences with the robot.We believe that this work provides an initial solution for moving beyond dialog generation systems, marking a signifcant stride toward developing increasingly sophisticated and adaptable HRI systems.Indeed, structuring a dialogue based on ontology (where a set of concepts and their relationships are delineated), allows for a more predictable and controlled output.This approach ensures greater consistency in context, being particularly useful in contexts characterized by the presence of specifc rules, and it should be preferred when machine responses need to be controlled and predictable.The ontology can be adapted for specifc needs of a domain, allowing for greater customization and fexibility than a large generative models.Looking to the future, there are multiple directions in which this project could evolve further.First, it is essential to continue working on adaptive mechanisms to enrich the robot with more knowledge and reasoning capabilities, enabling it to adapt autonomously to the needs of each user and his/her preferred choices [12].Specifcally, the following directions can be considered: i) Improving the knowledge base on which the robot relies.This includes a thorough mapping of our ontology with existing authoritative resources [17] and the integration of knowledge from addisional sources [11] ; ii) Improving AI models for better adapting the system to the user features; iii) Improving communication with the robot and exploring advanced ways of interacting with the robot by improving speech recognition steps (Speech-to-Text) or leveraging sensor for purposes such as facial expression analysis; iv) Further optimizations continue to optimize the system to reduce response times and increase consistency in responses.

Table 1 :
Ontology's classes hierarchy exampleNext, relationships were established between the diferent classes to model the conceptual connections existing in the domain.Table2shows an example.

Table 2 :
Ontology's object property example

Table 3 :
Table 3 shows an example.

Table 4 :
Partial extracted data matrix example Defnition of SPARQL queries for the extraction of data from Wikidata and DBpedia all the SPARQL queries needed to extract the data of interest were defned.(4)Labelassignment:startingwith an element identifer, labels and/or sitelinks were retrieved using the Wikidata API.The new fnal matrix containing the extracted and manipulated data with the respective retrieved labels was then defned(Table5).

Table 5 :
Final extracted data matrix example