Enabling machines to communicate like humans is a long-term goal of open-domain dialogue generation. To achieve this goal, more and more studies on dialogue generation focus on the key factor, emotion. The empathetic dialogue system aims to recognize user's emotion and situation, then generates responses accordingly. Such empathetic dialogue system can improve user’s experience and establish long-term human-machine interaction. However, the existing empathetic dialogue generation models ignore the continuity of parties' emotional expression in adjacent dialogue turns, resulting in inadequate emotional perception. Besides, the emotions involved in empathetic response are flexible, it is difficult to set the specific empathetic policy.
To address above problems, a research team led by Donghong Han published their new research on 15 April 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a novel empathetic dialogue generation model ETHREED, which relies on hierarchical GRUs to extract and track the emotional representation of both parties in dialogues separately. Besides, the model predict the responses' emotional representations by using the stochastic policy network and the guided policy search. The experimental results show that our responses have better diversity, empathy and relevance.
In one dialogue, parties’ emotions tend to be continuous, or shift toward positive or negative depending on context. For modeling the continuous process of different parties, ETHREED utilizes four GRUs to get the global state, party state, emotional representation, and content representation in dialogues. The global GRU tracks all utterance representations to get context information. The party GRU models the interaction of parties. The emotion GRU tracks parties’ emotions respectively. The content GRU extracts the dialogue content representation and mitigates emotion perception errors. Additionally, empathy can be seen as the transfer of emotion between the parties, so the research define the process of predicting the listener's emotion state based on the speaker's emotion state as the empathetic policy. A stochastic policy network is used to model this process. We use the listener's true response emotion distribution as a constraint to guide the policy search. Lastly, the pointer generation network dynamically incorporates the predicted listener's emotional representation and context information to decode.
Future work can consider introducing the dialogue behavior to guide the response generation and explore more reasonable evaluation metrics.
DOI: 10.1007/s11704-023-2792-7
Journal
Frontiers of Computer Science
Method of Research
Experimental study
Subject of Research
Not applicable
Article Title
Generating empathetic responses through emotion tracking and constraint guidance
Article Publication Date
15-Apr-2024