Ishikawa, Japan -- Speech and language recognition technology is a rapidly developing field, which has led to the emergence of novel speech dialog systems, such as Amazon Alexa and Siri. A significant milestone in the development of dialog artificial intelligence (AI) systems is the addition of emotional intelligence. A system able to recognize the emotional states of the user, in addition to understanding language, would generate a more empathetic response, leading to a more immersive experience for the user.
“Multimodal sentiment analysis” is a group of methods that constitute the gold standard for an AI dialog system with sentiment detection. These methods can automatically analyze a person's psychological state from their speech, voice color, facial expression, and posture and are crucial for human-centered AI systems. The technique could potentially realize an emotionally intelligent AI with beyond-human capabilities, which understands the user’s sentiment and generates a response accordingly.
However, current emotion estimation methods focus only on observable information and do not account for the information contained in unobservable signals, such as physiological signals. Such signals are a potential gold mine of emotions that could improve the sentiment estimation performance tremendously.
In a new study published in the journal IEEE Transactions on Affective Computing, physiological signals were added to multimodal sentiment analysis for the first time by researchers from Japan, a collaborative team comprising Associate Professor Shogo Okada from Japan Advanced Institute of Science and Technology (JAIST) and Prof. Kazunori Komatani from the Institute of Scientific and Industrial Research at Osaka University. “Humans are very good at concealing their feelings. The internal emotional state of a user is not always accurately reflected by the content of the dialog, but since it is difficult for a person to consciously control their biological signals, such as heart rate, it may be useful to use these for estimating their emotional state. This could make for an AI with sentiment estimation capabilities that are beyond human,” explains Dr. Okada.
The team analyzed 2468 exchanges with a dialog AI obtained from 26 participants to estimate the level of enjoyment experienced by the user during the conversation. The user was then asked to assess how enjoyable or boring they found the conversation to be. The team used the multimodal dialogue data set named “Hazumi1911,” which uniquely combined speech recognition, voice color sensors, facial expression and posture detection with skin potential, a form of physiological response sensing.
“On comparing all the separate sources of information, the biological signal information proved to be more effective than voice and facial expression. When we combined the language information with biological signal information to estimate the self-assessed internal state while talking with the system, the AI’s performance became comparable to that of a human,” comments an excited Dr. Okada.
These findings suggest that the detection of physiological signals in humans, which typically remain hidden from our view, could pave the way for highly emotionally intelligent AI-based dialog systems, making for more natural and satisfying human-machine interactions. Moreover, emotionally intelligent AI systems could help identify and monitor mental illness by sensing a change in daily emotional states. They could also come handy in education where the AI could gauge whether the learner is interested and excited over a topic of discussion, or bored, leading to changes in teaching strategy and more efficient educational services.
###
Title of original paper: |
Effects of Physiological Signals in Different Types of Multimodal Sentiment Estimation |
Journal: |
IEEE Transactions on Affective Computing |
DOI: |
10.1109/TAFFC.2022.3155604 |
About Japan Advanced Institute of Science and Technology, Japan
Founded in 1990 in Ishikawa prefecture, the Japan Advanced Institute of Science and Technology (JAIST) was the first independent national graduate school in Japan. Now, after 30 years of steady progress, JAIST has become one of Japan’s top-ranking universities. JAIST counts with multiple satellite campuses and strives to foster capable leaders with a state-of-the-art education system where diversity is key; about 40% of its alumni are international students. The university has a unique style of graduate education based on a carefully designed coursework-oriented curriculum to ensure that its students have a solid foundation on which to carry out cutting-edge research. JAIST also works closely both with local and overseas communities by promoting industry–academia collaborative research.
About Associate Professor Shogo Okada from Japan Advanced Institute of Science and Technology, Japan
Shogo Okada runs the Laboratory for Computational modeling for understanding and generating multimodal social signal patterns, as part of the Intelligent Robotics Area at the Japan Advanced Institute of Science and Technology, where he holds the position of Associate Professor. His research focuses on building computational model of multimodal social signals by using speech signal processing, image processing, motion sensor processing, and pattern recognition techniques and the use of multimodal networks for machine learning and data mining applications. He has 79 publications with over 350 citations to his credit. For more information, visit: https://www.jaist.ac.jp/english/areas/ir/laboratory/okada.html
Journal
IEEE Transactions on Affective Computing
Article Title
Effects of Physiological Signals in Different Types of Multimodal Sentiment Estimation
Article Publication Date
3-Mar-2022