This project focuses on developing an emotion recognition system from speech using Mel Spectrograms and Convolutional Neural Networks (CNNs). The dataset used is the Acted Emotional Speech Dynamic Database (AESDD), which contains audio files categorized into five emotions: angry, disgust, fear, happy, and sad.
- Transform raw audio files into numerical representations using Mel Spectrograms.
- Train CNNs to classify emotions from these spectrograms.
- Address challenges such as:
- Variable input shapes.
- Small dataset size.
- Compare performance using original data with batch of size 1, resized data, and augmented datasets.
- The best results were obtained using the original dataset with a batch size of 1, achieving an accuracy of 74.38%.
- Resized spectrograms resulted in lower accuracy, likely due to loss of crucial information during resizing.
- Artificially augmented data achieved accuracy similar to the original dataset, but with possible information loss during augmentation.
-
Clone the repository:
git clone https://github.com/SigurdST/emotion_recognition.git cd emotion_recognition
-
Explore
notebook.ipynb
to review all the code implementation and processes, andREPORT.md
for detailed explanations, results, and insights derived from the project.