Skip to content

Srikanth635/USEMLOG_Speech

Repository files navigation

Speech To Text Transcription in Unreal Engine (RobCoG VR)

Different Approaches

  • Sphinx Based Plugin
  • Whisper Speech-to-Text Unreal Engine Plugin
  • REST API (Unreal Engine) – Flask

Sphinx Based Plugin

GIT : Sphinx Unreal Engine Plugin

Acoustic Model : Contains a statistical representation of the distinct sounds for every word in vocab and each sound corresponds to a phoneme Language Model : Contains list of words and their probability of occurrence in sequence

Sphinx_folder_structure Sphinx_phonemes.jpg

Fig. (a) Folder structure inside content/model directory (b) Phonemes inside the vocab

Sphinx_blueprint1.jpg Sphinx_blueprint2.jpg

Fig. (a) Setting Probability tolerance for Recognised phrases (b) Reading and Displaying Recognised text

Sphinx_working.jpg

Fig. Overview of Sphinx plugin Speech to Text operation

Drawbacks:

  • Need to add phonemes (vocabulary) for the words to get recognised
  • Performance is not good for text with 2 or more words

Whisper Speech-to-Text Unreal Engine Plugin

Reference : Whisper Cpp
GIT : ../blob/main/SpeechRecognition.zip

Libraries Used:

  • SDL2
  • Whisper (C++)
  • Standard Library C++ 17
  • Containers : Array, Vector, Map, Set
  • Streams : fstream, iostream, sstream
  • Concurrency : thread, mutex, atomic

Whisper_plugin_build.jpg

Fig. Code snippet inside Build.cs of speech-to-text unreal engine plugin

Inside 'SpeechRecognition\Source\SpeechRecognition\Private\ MySpeechWorker ', functions to record audio, scaling, filtering are found.

Processed audio is passed on to whisper network to get transcripted text as output.

Whisper_plugin_blueprint.jpg

Fig. Code snippet to retrieve audio buffer and invoke whisper for transcriptions

Drawbacks:

  • Speed : The speed of transciption is greater than 8 secs which is not reliable
  • Accuracy : Obtained transcriptions doesnt match the speaker utterances always

REST API (Unreal Engine) – Flask

GIT (USemLog) : ../USemLog/tree/SpeechRecord
GIT (Flask Python file) : ../IAI_USEMLOG_REST_Speech/blob/master/voice.py

  • FLASK :

    • Flask, a lightweight framework for building web applications in Python
    • Used Python’s PyAudio to read in audio data with required format, rate etc.,
    • Used routes to map URLs to functions that handle the requests
    • Listened for incoming HTTP requests and responds with appropriate HTTP responses (transcriptions)
  • Unreal REST API :

    • Used Unreal engine’s C++ HTTP modules to raise API requests (start and stop recordings)
    • Used parsing libraries (JSON) to parse the received response from flask
  • VR Motion Controller Mappings :

    SL_Logger.PNG SL_Logger_inputs.PNG

    Fig. (a) Params in SL_LoggerManager used to map inputs (b) VR Trackpad buttons mapped as inputs to params in project_settings/inputs

Libraries and Tools Used :

  • Unreal Engine (4.27,5.1.1)
  • USemlog Plugin
  • C++ Libraries (STL:Containers, JSON, HTTP)
  • PyCharm (Flask API)
  • Python Libraries (PyAudio, Whisper, os, wave, datetime, torch, threading)

Whisper_Flask_working.jpg

Fig. Overview of Speech to Text Operation in ‘RobCoG’ using FLASK API

Fig. Code snippet of variables to facilitate controller mappings

Fig. Code snippet of controller mappings to functions

Fig. Code snippet of mapped functions calls definition

Fig. Code snippet of function to send start audio signal API request

Fig. Code snippet of function to send stop audio signal API request

Usage

  • Out of 3 approaches mentioned above, third approach i.e., REST API (Unreal Engine) – Flask performance is satisfactory
  • Steps to follow:
    • Get the updated USemLog plugin with speech scripts available into the unreal engine project
    • Inside PyCharm open and install all the packages used in voice.py script
    • Then run the flask application voice.py script in local host server
    • Can check the server status inside a POSTMAN plugin on chrome using the local host server URL
    • Flask application is listening for API requests
    • Start the RobCoG project
    • Use the VR controllers to start and stop audio recordings
    • Click the right controller trackpad down button to raise an API request, to start the audio recording
    • User should start speaking
    • Running Flask applcation receives the request and starts recording the user utterances
    • Click the left controller trackpad down button to raise an API request, to stop the audio recording
    • OR the recording also gets stopped when the unreal game gets ended
    • Upon stop request, the flask application finishes recording and saves the audio in .wav file and invokes whisper package for transcriptions
    • The transcriptions are then formatted to JSON and are passed back to RobCoG and displayed to the user.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages