Initial Study on Robot Emotional Expression Using Manpu

In recent years, robots have started to play an active role in various places in society. The ability of robots not only to convey information but also to interact emotionally, is necessary to realize a human-robot symbiotic society. Many studies have been conducted on the emotional expression of robots. However, as robots come in a wide variety of designs, it is difficult to construct a generic expression method, and some robots are not equipped with expression devices such as faces or displays. To address these problems, this research aims to develop technology that enables robots to express emotions, using Manpu (a symbolic method used in comic books, expressing not only the emotions of humans and animals but also the states of objects) and mixed reality technology. As the first step of the research, we categorize manpu and use large language models to generate manpu expressions according to the dialogue information.


INTRODUCTION
Robots are increasingly active in various places in society, such as homes, restaurants, shops, and factories.To build good relationships between robots and humans as companions living and working in the same space, it is considered important not only to communicate information but also to interact with emotional expressions.It has been reported that emotional expressions by a robot could make the robot more engaging and likable [1].Emotional expressions also enable intuitive communication and are considered useful as a means for robots to communicate their own status [5].In previous robotics research, the efectiveness of various emotion expression methods has been demonstrated, including facial expressions, movements, speech, and contact interaction.However, in many cases, approaches of developing specialized hardware, such as faces with changing facial expressions, have been taken, making the developed methods dependent on the robot hardware.As a result, it is necessary to construct methods that are tailored to each individual robot, which is costly to develop.
Depending on the use and location, robot designs are diverse: humanoid, animal-like, character-like, only arm, and so on.As an emotional expression that can be used regardless of the hardware design, we focused on "manpu."Manpu is a Japanese word for a comic symbol or a comic mark, which is a symbolic expression used in comics (Figure 1).Manpus are mainly used to express the emotions, inner states, and situations of the characters.In addition, they are also used to express the state of animals and objects.Therefore, manpus are considered to be an intuitive method of expression that can be applied to a wide variety of robots, including robots without explicit facial features.Previous research showed that using manpus in a humanoid robot makes it easier for humans to recognize the robot's emotions [15].However, no system has been developed to generate manpus that match the dialogues in actual speech interaction.In order to build a general method for manpu expression, it is considered necessary to create a systematic database of manpus and to develop a model for generating manpus that matches the context.
In addition, there should be some methods to display the generated manpu adapting with robots.We suppose that a presentation method utilizing mixed reality (MR) technology would be useful as a low-cost and less dependent on the hardware design of the robot.
Research on the use of MR technology to manipulate and teach robots has been conducted in recent years [8,21], and MR devices are expected to become more widespread in the future.In terms of daily interaction with robots, research was conducted on the use of MR to realize the expression of eating behavior, which is difcult to achieve with a robot alone [10].In future research, we plan to develop technology that recognizes and estimates the robot's body parts, in order to display manpus in appropriate positions with the robot in the MR space.
The fnal goal of this research is to build good cooperative and symbiotic relationships between humans and robots by enabling robots to perform rich emotional expressions using manpu.As a frst step in the research, this paper addresses the classifcation of manpu and their annotation methods, as well as a method for generating manpu based on dialogue information using a large language model (LLM).The paper is structured as follows.In this section, we described the novel concept of using manpu expression in humanrobot interaction.Section 2 summarizes the related research on robotics, especially in the feld of human-robot interaction, and comic engineering.Section 3 describes manpu classifcation and annotation, and Section 4 presents preliminary results of generating manpu using LLMs.Finally, Section 5 concludes and states future work.

RELATED RESEARCH 2.1 Emotional Expression by Robots
Various methods for expressing emotions by robots have been studied.Facial expression is a typical method.Technologies creating hardware that resembles humans, as in the case of androids, are famous [12].Methods that use projection to enable the various facial expressions, such as Furhat [4], have also been proposed.While those robots present human-like facial expressions, some robots such as Haru [13], make facial expressions like cartoon characters.Other research has also been conducted on expressing emotions by combining facial expressions and colors [9].
There is also much research on using a robot's entire body for emotional expressions.As for gesturing, there are studies that generate gestures from dialogue content [19], and studies that generate gestures from the movements and voice information of the human [25].Noguchi et al. proposed emotional expression by weight shifting of the robot using movable weight [20].
Voice is another emotional expression technique that can be used with many robots [7].Speech synthesis technology has developed in recent years, and there is some research on methods for generating emotional speech using deep learning techniques [18].
However, methods for expressing emotions using facial expressions and body movements often highly depend on the robot's design and body structure and must be developed individually for each robot.Although voice is a hardware-independent method, it is sometimes difcult to convey emotions only by nuances and infections of voice.In this study, we focus on the manpu, which is used to express the emotions and states of humans, animals, and other objects in comics, as a general emotional expression method that does not depend on the hardware design of the robot.It is also expected that the manpu can be combined with existing emotional expression methods to enrich the interaction.

Comic Engineering
Comic engineering is a research feld that explores technologies for utilizing comics and their potential applications.Much research has been conducted on comics from a variety of perspectives.For example, regarding the extraction of information from comics, technologies for recognizing items such as speech bubbles, text, and characters have been developed [6,22].Other research has been conducted to structure the elements of comics [23] and to support the production of comics [14].
Research on manpu has also been conducted in the past.Analysis of the meanings and expressions of the manpus has also revealed that the same type of manpu can have diferent meanings depending on the position and situation in which it is given [2,3].For example, a manpu for a drop can be a tear, a snife, or a splash of water.
There has also been research into the utilization of techniques used in comics for communication.Yamanishi et al. developed a method for estimating the shape of speech bubbles from the content of a message for the purpose of visualizing the nuance of the message by the shape of the speech bubbles [26].Gemba et al. proposed a method for expressing emotions in avatars by using manpu [11].The study showed that the emotions of the users were more easily conveyed when using manpus.
In the feld of robotics, there is research on the utilization of expressions used in comics.KOBIAN-R II has a head that is able to do facial expressions with comic marks including manpu [15].There are fexible full-color LED displays to show some manpus: cross-popping veins and tear mark.There are also mechanisms that push and pull the sheet for expressing black vertical lines on the forehead and cheek, and black lines as wrinkles on the jaw.In an experiment with Japanese people, the emotion recognition rates were improved when there were manpu expressions on the robot's face.The results show that manpu expressions adapting to the cultural background can help to convey emotions.
Although various studies have been conducted in the feld of comic engineering, there are no studies systematically associated with the characters' dialogues with regard to the manpu.As for the research on the use of manpus in the expression of emotions in avatars and robots, it is simply a rule-based mapping of manpus to the emotions to be expressed, based on typical usage.In order for robots to express emotions richly in real-world interactions, a technology to realize a variety of manpus according to the context is required.In this study, we examine an annotation of manpu with characters' dialogues and a model for generating manpus according to the dialogue.

Use of LLMs in Robotics
In recent years, there has been a remarkable evolution of LLMs as represented by ChatGPT.The use of LLMs has also become increasingly popular in the research feld of robotics.There is a study using ChatGPT for robotics applications to do tasks [24].High-level robot APIs and function libraries are defned and text prompts that describe the task goal and description of available functions are used.There is also research using LLM to express emotions by non-verbal cues for empathetic communication by the robot [17].LLM prompts including the persona of the robot are used We thought LLM could be useful to generate manpus according to the dialogue information.Therefore, this paper examines the generation of manpus by adding knowledge data about manpus to LLMs.

MANPU ANNOTATION 3.1 Manpu Classifcation
In order to display the manpu according to the robot, we considered that the classifcation of the types of manpu, their assigned position, and the number of presented manpu are important.
In the research feld of comic engineering, there are some studies according to the classifcation of manpu [2,3].However, they are only classifed manpu mainly in one comic book and the classifcation is not exhaustive.Using information from a book [16] and web pages, we extracted some types of manpu as described in Figure 2. In terms of exclusion criteria, we excluded onomatopoeia, those related to backgrounds and landscapes, and those that could be classifed as facial expressions or gestures.
Regarding the classifcation of the assigned position of manpu, previous research only annotated with rough positions and also assumed a two-dimensional representation [2].Assuming that manpu will fnally be applied to robots in three-dimensional space, we applied a way of classifying each part of the human body, such as in front, behind, outside, and inside of each part.We also introduced more detailed classifcation in particular for the face, so that a detailed representation can be achieved.
The number of manpus is also important in the representation.For example, when representing sweat, several manpus of splashing water are often used.As shown in Figure 1, some items that are typically used in combinations of several marks (e.g.sleeping (zzz) and awareness line ( \ | / ) ) are treated as a single set.

Development of an Annotation Tool
We developed the tool for annotating manpus and related dialogues in the dataset as shown in Fiigure 3. As a frst step, we focused on the ones associated with the characters.The image of the target page is placed on the left side of the screen and manpu is annotated with a bounding box.In the bottom right, we placed buttons to click on the type and position of the manpu.There is also a feld for entering the number of manpu.In the top right, there are pull-down boxes for selecting the name of the character to whom the manpu relates and the content of the dialogue if any, as well as the name and dialogue contents of the character who appeared before the manpu-related character.The names of these characters and the contents of the dialogues are based on annotation data originally provided in the dataset.

Dataset Annotation
We use a comic dataset called Manga109, which consists of 109 comics.It was developed in Japan and consists of Japanese comics by 94 authors.The included comics were published from the 1970s to the 2010s and covered a wide range of target audiences and genres.The total number of the comic pages in the dataset is 21,142.It is annotated with frames, text, character faces and bodies.The total number of annotations is more than 50,000.However, there are no annotations on the manpu.
We conducted the annotation of manpu at the frst part of the one comic ("Aisazunihairarenai" by Masako Yoshi) in the dataset using the developed annotation tool.In the next section, we describe the generation of manpu expression according to the dialogues in unannotated pages of the same comic book, by using the annotation data as knowledge.

PRELIMINARY EXPERIMENT OF GENERATING MANPU EXPRESSION 4.1 Experiment Using GPTs
In this paper, we investigated the ability of the LLM to generate manpu expressions.We used GPTs function which was released from OpenAI in November 2023.Using GPTs, we can customize ChatGPT (GPT-4) to create a tool for our own purposes.
There is a Knowledge function in GPTs, which can keep large information and recall them if necessary.We can use many types of fles not only images but also other documents such as pdf fles and spreadsheets as "knowledge data."ChatGPT can retain the uploaded fles and create specialized GPTs.We built models that output manpu information according to the context, by giving data related to manpu as the knowledge data to GPTs.In addition, the instructions were written to output the type, position, and number of manpus.

Experiment conditions
The following types of knowledge data for GPTs were prepared: basic data about manpu, image data of the annotated comic pages, and manpu annotation data.Basic data is about the classifcations with regard to the types and positions of manpu, and explanations of the general use of each manpu.It contains the xlsx fle with descriptions and usage of each type of manpu, with reference to [16], yml fle describing the classifcation of the position of manpu, and png fle of sample manpu images such as in Figure 2. Since there is a limit to the number of fles that can be uploaded to GPTs, we arranged many manpu images in a single image fle in a grid pattern.Using prompts, we instructed GPTs on the corresponding names for each manpu image arranged in the grid pattern.Image data of the annotated comic pages are provided in png fles and manpu annotation data is provided in a csv fle.Annotation data contains information about the type/positions/number of manpu in association with dialogues and characters in the annotated comic.We compare the following four conditions regarding the knowledge data of GPTs: only basic data, basic and image data, basic and annotation data, and all of them.

Preliminary Results
Figure 4 shows an example of preliminary results of a comparison of four diferent knowledge given to GPTs.The target situation is a frame in unannotated parts of the comic.In the prompt, the dialogue contents were given to generate manpu.The table shows the outputs of the GPTs when given each knowledge data and images of the robot model to which manpu illustrations were manually painted.data as knowledge, the outputs of manpu expressions are simplifed.It is suggested that GPTs could learn the appropriate amount of manpu from the knowledge data.
Though there are ground truths in the dataset, there could be more than one way of manpu expressions that are appropriate to the situation.It would be useful to conduct user experiments for preferences of manpu expression in future research.
There are some limitations of using GPTs.The current GPT's capacity is not suitable for real-time interaction due to the slow generation speed.In addition, there is a limit to the number of fles that can be uploaded as knowledge data.

CONCLUSIOON
In this study, we proposed novel emotional expression methods by robots using manpu.We categorized manpu and developed an annotation tool for extracting manpu from comics as data corresponding to characters and dialogues.We tried to use GPTs to generate manpu from the dialogue context using examples of situations from the comic, and compare the diference of the generation results between the knowledge data given to GPTs.
In future work, we plan to build an annotation method that is more integratable with current annotation data, such as mapping manpu with frames in the comics, and complete the annotation about manpu.We also want to try other deep learning models and conduct user experiments to obtain the evaluation and feedback of the system.In addition, the system for colored manpus also needs to be considered.Since the fnal goal of this research is using manpu expression in real-world human-robot interaction, the manpu system needs to be linked to the real robot through the MR space.

Discussion
As shown in preliminary results, GPTs can generate some manpu information.If we only used basic data as knowledge in GPTs, it tended to output more manpus.When giving images or annotation

Figure 1 :
Figure 1: Examples of manpu expression used in comics.

Figure 2 :
Figure 2: Illustration showing examples of the classifed manpu types.

Figure 3 :
Figure 3: Prototype of developed annotation tool for manpu.

Figure 4 :
Figure 4: Preliminary results of manpu generation using GPTs.The prompt is 'You are told "All right!! What?Please go ahead and say it."You will answer "What?Are you sure?" Please generate the manpu expression.'