The rise of deep learning technologies, backed by the growing accessibility of ever bigger corpora of data, has contributed to the development of increasingly advanced dialogue generation systems. At Zaion, we aim to research these more developed models to eventually apply them to customer service.
Beyond the issues of grammar or fluency in the generated content, human-machine interaction faces one main challenge for the conversational agent – consistency. It implies different sub-problems, such as:
- Logical consistency (relevance of the generated answer to the conversation history)
- Persona consistency (relevance of the bot’s behaviour to its past behaviour in the interaction)
- Social consistency (relevance of the answer to a set of accepted social etiquette and rules)
- Emotional consistency (relevance to the user’s emotional state)
Consistency is important to avoid situations where the users would find the bot “strange”. A task-oriented agent is often representative of an entity that provides a service, and as such, needs to be reliable from a business standpoint, but also offer an enjoyable social interaction. Even though socio-emotional bots exist in task-oriented applications, most of these systems are used and designed for the open domain.
While the logical consistency point can be solved with computation power, the other three points are much less studied, even though many studies have proven how beneficial emotionally aware behaviour is to overall user experience.
However, complex concepts such as emotion are tricky to annotate because of how subjective the task is, hence why it is hard to obtain relevant and reliable data. In deep learning, where data is at the heart of the systems’ learning processes and where a huge amount of data is required, this quickly becomes a significant challenge.
In this series of two blog posts, I will be presenting the different methods used in literature to design socio-emotional data collection and annotation:
- How is socio-emotional data collected and annotated?
- What are the studied interesting socio-emotional strategies?
How is socio-emotional data collected and annotated?
In deep learning, data is crucial as it is what the systems will learn their representations from. For example, at Zaion we can use our customer’s data (within the limits of our contracts), and label the data through our expert annotation team to feed it to train our language models.
These approaches are used in the literature of socio-emotional conversational systems, and the datasets I will mention are, for the most part, corpora made up of textual data, and sometimes transcribed from audio sources.
Dataset Collection and Annotation
We will go over three main collection approaches as well as the associated annotation methods we have observed in the literature.
Collection: Crowdsourcing, when applied to data collection, is a participatory method where a group of people contributes to creating data samples. The crowd-sourced data is typically Human-Human interaction (H-H). They usually involve a speaker, or seeker, that would convey emotion; and a listener, or helper, that would have to answer accordingly. Dialogue Systems are then trained to perform in the role of listeners. The data is collected by having the two crowd workers interact by following set guidelines. For empathetic dialogues, workers taking the role of speakers are asked to start the conversation following an emotional prompt. The listeners must adapt their replies to the context as presented by their interlocutor without being aware of the prompt or the situation beforehand. One such example is the ESConv dataset.
Derived Annotations: For crowdsourcing, the labels (emotions and dialogue strategies) associated with the data are directly derived from the instructions given to the annotators. Additionally, answers to surveys submitted to the workers during the collection process can be collected on both the listener and the speaker side, which allows for the collection of more data, such as empathy grading and utterance-level dialogue strategies.
Crawled from online sources
Collection: Another common way to collect data is by crawling, which in other words is extracting information from online sources. In the case of textual data, it is often posts and comments crawled from social media and is thus natural Human-Human speech. It can also come from other sources (such as OpenSubtitles) where the data is scripted. The data extracted from those websites is usually not labelled and the annotation processes must be designed to label the corpora.
Manual Annotations: When datasets are small or if the research team has the material means, data can be entirely annotated by human experts or annotators that have been trained on that specific annotation task. DailyDialog was annotated by 3 experts that hold a good understanding of dialogue and communication theory, who were taught the guidelines of the task (emotion and dialogue act annotation).
Semi-automatic Annotations: Manual annotation is usually paired up with algorithms to accelerate the work and lighten the human judges’ workload. This hybrid approach is called semi-automatic annotation. In general, the first step is to have human judges annotate a small fraction of the collected dialogues. To provide further support to the human helpers, the EDOS dataset team used a Bert-based model, trained on the empathetic dialogues dataset, to output for each dialogue the top 3 most likely emotion label. This is to prevent having the human judges select one of the 42 available labels, instead having them pick between 3 labels, with the possibility to select one of the others if needed. The second and last part of the process is to use this manually annotated data as training data for a classifier that will automatically annotate the rest of the collected data.
Derived Annotations: The context in which web data was posted can be used. For example, for the PEC dataset, posts and comments have been extracted from two subreddits: happy and offmychest. The original reddit environment thus provides a label and what is left to do is to direct a quality check by asking human annotators to annotate a small set of the conversations (100 from the happy reddit, 100 from the offmychest reddit and 100 from another reddit, casualconversations, for control).
Retrieved from deployed services
Collection: When services such as customer assistance, chatbots and such are already deployed, it’s possible to retrieve the logs to compile them into corpuses. Most of the time, this concerns human-machine interactions, but it can also be used for human-human conversations (such as call center data).
Manual/Semi-Automatic Annotation: This type of data can use the same annotation schemes as crawled data: human annotation possibly helped by AI approaches as described above. For their EmoContext dataset, 50 human annotators manually annotated 300 dialogues for each of the 4 classes, and each dialogue was looked over by 7 judges. These annotated dialogues were embedded as vectors, and then used along with cosine similarity thresholds to find similar occurrences in the non-annotated pool of data. Results would then be checked by human judges that would make the last ruling.
User Feedback: Some bots in production can ask for customer satisfaction feedback, either directly or through surveys. This information can be used to annotate certain conversations.