Zaion Emotion Dataset (ZED): A fine-grained emotion dataset for Speech Emotion Diarization

The new Speech Emotion Diarization has been integrated in Zaion Voice Analytics.

The ZED dataset is free to download.

20 June 2023
  • resources >
  • Zaion Lab Blog >
  • Zaion Emotion Dataset (ZED): A fine-grained emotion dataset for Speech Emotion Diarization

Yingzhi Wang, Deep Learning Engineer at Zaion Lab, speechbrain collaborator, engineer from Centrale-Supélec.

I’m passionate about speech technologies so I talk too much.

Speech Emotion Diarization

Speech Emotion Diarization (paper link) is a task proposed for fine-grained speech emotion recognition. Just as Speaker Diarization answers the question of ”Who speaks when”, Speech Emotion Diarization answers the question of ”Which emotion appears when”.

 

(ZED): A Fine-grained Emotion Dataset for Speech Emotion Diarization

 

The Speech Emotion Diarization task takes an utterance as input and aims to find out if particular emotions are present within the utterance and also find out their corresponding boundaries. A comparison of traditional utterance-level Speech Emotion Recognition (SER) and Speech Emotion Diarization (SED) can be found in the image above.

Zaion Emotion Dataset (ZED)

The lack of available datasets is an important obstacle to the studies of fine-grained speech emotion recognition. We released the Zaion Emotion Dataset (ZED) which is annotated with discrete emotion labels and also the emotion boundaries at frame-level for a spoken utterance.

An example:

 

(ZED): A Fine-grained Emotion Dataset for Speech Emotion Diarization

This example presents an audio where a happiness is perceived from 1.797 seconds to the end of the audio, while the first 1.797 seconds is neutral. It should be noted that for an utterance, only non-neutral emotions are shown in ZED.json, the rest is neutral by default.

Some basic statistics of ZED:

language English
transcription
number of utterances 180
duration   17 minutes
emotions Happy + Sad+ Angry + Neutral
speakers 73

 

Call For Contribution

If you are interested in helping us enlarge the dataset or expand the dataset to other languages, please contact us.

 

Emotion Diarization as a key feature for your Voice Analytics tool

The new Speech Emotion Diarization has been integrated to our Voice Analytics platform. This key component will be soon available for our customers !

Recognizing your users’ emotions with a great accuracy and granularity would allow you to identify key moments in conversations and precisely spot pain points in your customer service.

The ZED dataset is free to download.

Download Dataset