Speech Emotion Recognition Model and Inferencing using Pytorch
pip install vistec-ser
git clone https://github.com/tann9949/vistec-ser.git
cd vistec-ser
python setup.py install
We provide Google Colaboratory example for training the THAI SER dataset using our repository.
Note that currently, this workflow only supports pre-loaded features. So it might comsume an additional overhead of ~2 Gb or RAM. To run the experiment. Run the following command
Since there are 80 studios recording and 20 zoom recording. We split the dataset into 10-fold, 10 studios each. Then evaluate using
k-fold cross validation method. We provide 2 k-fold experiments: including and excluding zoom recording. This can be configured
in config file (see examples/aisser.yaml
)
python examples/train_fold_aisser.py --config-path <path-to-config> --n-iter <number-of-iterations>
We also implement a FastAPI backend server as an example of deploying a SER model. To run the server, run
cd examples
uvicorn server:app --reload
You can customize the server by modifying example/thaiser.yaml
in inference
field.
Once the server spawn, you can do HTTP POST request in form-data
format. and JSON will return as the following format:
[
{
"name": <request-file-name>,
"prob": {
"neutral": <p(neu)>,
"anger": <p(ang)>,
"happiness": <p(hap)>,
"sadness": <p(sad)>
}
}, ...
]
See an example below:
Chompakorn Chaksangchaichot
Email: [email protected]