Social robots operate in a multimodal manner: they can use various functional gestures, engage in social navigation, and even observe different intents in conversations, while computing and displaying each behavior related to the humans they interact with. So, how do these patterns that make up robot behavior affect human behavior during the interaction process?
Karen Tatarian is a robotic engineer and researcher who recently passed her doctoral defense at Sorbonne University with distinction, earning a Ph.D. in Robotics and Artificial Intelligence.
With the rapid development of artificial intelligence technology and the field of robotics, Karen has been striving to make human-centered social intelligent robots and products a reality.
Recently, Karen published a paper titled "How does Modality Matter? Investigating the Synthesis and Effects of Multi-modal Robot Behavior on Social Intelligence" in "Computer Vision News," which studied the impact of Pepper on social intelligence.
The paper involved data collection from 115 participants, each interacting with the autonomous robot for an average of 7 minutes. The author has also made the code open source, and the link is as follows:
:https://github.com/KarenTatarian/multimodal_socialcues
What role does modality play?
—Studying the impact of multimodal robot behavior on social intelligence
Overview
By observing humans, we find that social interaction requires certain cues that allow others to understand our actions and infer our intentions. These effective social signals and non-verbal behaviors are complex and multimodal, meaning they are composed of different patterns and cues, such as gestures, gaze behavior, and spatial behavior (e.g., management of space and environment). Therefore, a robot recognized as having social intelligence must be able to successfully engage in social communication, adapt to social environments, and display appropriate multimodal behavior.
In this paper, the author first investigates how these modules help adjust other modules; then explores the effects of multi-module execution on the outcomes of behavioral interaction and the robot's perception of social intelligence; and finally presents a modeling made using the principles of reinforcement learning. This modeling is beneficial for robots to learn how to combine multi-module behavior with reward functions based on the multi-module social signals generated in human interactions.
1. Adapting to Group Settings Using Human Behavior for Robots
Modules naturally occur in combination, and to adapt to environmental changes, they rely on other sensory modules. For example, the author's first paper mentioned that robots proactively change gaze patterns based on changes in social interactions (i.e., the groups formed around them). The use of spatial behavior is to estimate the roles of participants around the robot in the group formation process, such as active speakers, bystanders, or listeners (Figure 1). Compared to the experimental results of robots that switch gaze attention based on new sensory detection, it was found that the closer participants stand to the adaptive robot Pepper, the higher the adaptability and sociability scores they give, and at the same time, participants also feel that Pepper cares about them.
2. Understanding the Effects of Robot Multimodal Behavior
Multimodal behavior is composed of gaze mechanisms, including: turning, yielding, grounding, and joint attention actions achieved through social navigation; social gestures (symbolic, indicative, and beat gestures); automatically conducting social dialogues, and extracting a pattern in various situations; studying through research on behavioral outcomes and subjective measurements.
The brief process of the system is shown in Figure 2, and the specific code can be obtained on github/KarenTatarian. The experiment collected data from 105 participants, who each interacted with the robot alone for 7 minutes to investigate behavioral outcomes, which include but are not limited to the user's distance, speaking time, greetings conducted, and responses to greetings. The study can show to what extent each modality in the robot's multimodal behavior affects how humans stand, how participants address the robot, whether participants accept suggestions from the robot, and how they start and end interactions by imitating the robot's non-verbal behaviors.
3. Adaptation and Personalization in Human-Robot Interaction (HRI)
Users expect the media, robots, and technology they interact with to adapt to themselves, as this will, in turn, improve their usability. In human-robot interaction, an adaptive system does not necessarily teach the medium new behaviors but decides what to adapt by combining different behaviors and determining when to make these adjustments. Machine learning can solve the problem of achieving adaptability in human-robot interaction, and it can also serve as a way for robots to evaluate their behavior. However, the author mentioned that human behavior and social signals are inherently complex, dynamic, and continuous. Trying to discretize them leads to a large state space and information loss. In addition, training machine learning models for human-robot interaction is costly, and the COVID-19 pandemic has also proven that it can be very challenging at times.
To address the latter issue, the author used the rich data collected in previous work to construct a simulated setting and environment for human-robot interaction to train machine learning models for such use cases. Furthermore, to solve the adaptation problem using machine learning, the author studied how to formulate reward signals using human multimodal social signals, which were later used to adjust the robot's multimodal behavior, creating various combinations composed of gaze, gestures, synonyms, and emotional expressions, aiming to enhance the robot's social intelligence and influence. This reward function is designed to reflect the complexity and dynamics of human-robot interaction. The experimental results help us further investigate which combinations of modalities the medium will choose to constitute robot behavior. These findings are crucial for advancing the social intelligence of future technologies, so that robots can adapt to humans, learn from humans, and communicate with humans at a level beyond language.