Audio Emotion Detection

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53.

It achieves the following results on the evaluation set:

Loss: 0.9555
Accuracy: 0.6262

Model description

A model that returns Labels for Angry, Disgusted, Fearful, Happy, Neutral, Sad, Suprised. All aduio was trained at a sampling rate of 16000 and all inputs should be transformed to work properly.

Training and evaluation data

mozilla-foundation/common_voice_6_0
speech-recognition-community-v2/dev_data

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.01
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.5875	1.0	40	1.2574	0.5133
1.1637	2.0	80	1.0852	0.5590
0.9827	3.0	120	1.0048	0.6090
0.8683	4.0	160	0.9555	0.6262

Hatman
/

audio-emotion-detection

Audio Emotion Detection

Model description

Training and evaluation data

Training hyperparameters

Training results

Model tree for Hatman/audio-emotion-detection

Space using Hatman/audio-emotion-detection 1

Evaluation results