Speech To Text Transcription in Unreal Engine (RobCoG VR)

Different Approaches

Sphinx Based Plugin
Whisper Speech-to-Text Unreal Engine Plugin
REST API (Unreal Engine) – Flask

Sphinx Based Plugin

Acoustic Model : Contains a statistical representation of the distinct sounds for every word in vocab and each sound corresponds to a phoneme Language Model : Contains list of words and their probability of occurrence in sequence

Fig. (a) Folder structure inside content/model directory (b) Phonemes inside the vocab

Fig. (a) Setting Probability tolerance for Recognised phrases (b) Reading and Displaying Recognised text

Fig. Overview of Sphinx plugin Speech to Text operation

Drawbacks:

Need to add phonemes (vocabulary) for the words to get recognised
Performance is not good for text with 2 or more words

Whisper Speech-to-Text Unreal Engine Plugin

Reference : Whisper Cpp
GIT : ../blob/main/SpeechRecognition.zip

Libraries Used:

SDL2
Whisper (C++)
Standard Library C++ 17
Containers : Array, Vector, Map, Set
Streams : fstream, iostream, sstream
Concurrency : thread, mutex, atomic

Fig. Code snippet inside Build.cs of speech-to-text unreal engine plugin

Inside 'SpeechRecognition\Source\SpeechRecognition\Private\ MySpeechWorker ', functions to record audio, scaling, filtering are found.

Processed audio is passed on to whisper network to get transcripted text as output.

Fig. Code snippet to retrieve audio buffer and invoke whisper for transcriptions

Drawbacks:

Speed : The speed of transciption is greater than 8 secs which is not reliable
Accuracy : Obtained transcriptions doesnt match the speaker utterances always

REST API (Unreal Engine) – Flask

GIT (USemLog) : ../USemLog/tree/SpeechRecord
GIT (Flask Python file) : ../IAI_USEMLOG_REST_Speech/blob/master/voice.py

FLASK :
- Flask, a lightweight framework for building web applications in Python
- Used Python’s PyAudio to read in audio data with required format, rate etc.,
- Used routes to map URLs to functions that handle the requests
- Listened for incoming HTTP requests and responds with appropriate HTTP responses (transcriptions)
Unreal REST API :
- Used Unreal engine’s C++ HTTP modules to raise API requests (start and stop recordings)
- Used parsing libraries (JSON) to parse the received response from flask
VR Motion Controller Mappings :

Fig. (a) Params in SL_LoggerManager used to map inputs (b) VR Trackpad buttons mapped as inputs to params in project_settings/inputs

Libraries and Tools Used :

Unreal Engine (4.27,5.1.1)
USemlog Plugin
C++ Libraries (STL:Containers, JSON, HTTP)
PyCharm (Flask API)
Python Libraries (PyAudio, Whisper, os, wave, datetime, torch, threading)

Fig. Overview of Speech to Text Operation in ‘RobCoG’ using FLASK API

Fig. Code snippet of variables to facilitate controller mappings

Fig. Code snippet of controller mappings to functions

Fig. Code snippet of mapped functions calls definition

Fig. Code snippet of function to send start audio signal API request

Fig. Code snippet of function to send stop audio signal API request

Usage

Out of 3 approaches mentioned above, third approach i.e., REST API (Unreal Engine) – Flask performance is satisfactory
Steps to follow:
- Get the updated USemLog plugin with speech scripts available into the unreal engine project
- Inside PyCharm open and install all the packages used in voice.py script
- Then run the flask application voice.py script in local host server
- Can check the server status inside a POSTMAN plugin on chrome using the local host server URL
- Flask application is listening for API requests
- Start the RobCoG project
- Use the VR controllers to start and stop audio recordings
- Click the right controller trackpad down button to raise an API request, to start the audio recording
- User should start speaking
- Running Flask applcation receives the request and starts recording the user utterances
- Click the left controller trackpad down button to raise an API request, to stop the audio recording
- OR the recording also gets stopped when the unreal game gets ended
- Upon stop request, the flask application finishes recording and saves the audio in .wav file and invokes whisper package for transcriptions
- The transcriptions are then formatted to JSON and are passed back to RobCoG and displayed to the user.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
2023-05-12		2023-05-12
2023-05-25		2023-05-25
2023-05-26		2023-05-26
2023-06-01		2023-06-01
Documentation		Documentation
Old_Audio_Transcriptions		Old_Audio_Transcriptions
RASA		RASA
data		data
README.md		README.md
Synonyms.py		Synonyms.py
_output.txt		_output.txt
config.yml		config.yml
f1.wav		f1.wav
f2.wav		f2.wav
f3.wav		f3.wav
jfk.wav		jfk.wav
output.txt		output.txt
output1.txt		output1.txt
q7.wav		q7.wav
transcribe.py		transcribe.py
voice.py		voice.py
vosk_translate.py		vosk_translate.py
wordhoard_error.yaml		wordhoard_error.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech To Text Transcription in Unreal Engine (RobCoG VR)

Different Approaches

Sphinx Based Plugin

Whisper Speech-to-Text Unreal Engine Plugin

REST API (Unreal Engine) – Flask

Usage

About

Releases

Packages

Languages

Srikanth635/USEMLOG_Speech

Folders and files

Latest commit

History

Repository files navigation

Speech To Text Transcription in Unreal Engine (RobCoG VR)

Different Approaches

Sphinx Based Plugin

Whisper Speech-to-Text Unreal Engine Plugin

REST API (Unreal Engine) – Flask

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages