GitHub - sarumjanuch/mediasoup-transcriber: Speech to Text Transcriber server uses Mediasoup

Transcriber Server using Mediasoup SFU for WebRTC Communication

Simple Proof of Concept for creating a server accepting WebRTC connections using a mediasoup SFU and transcribe the audio using Google Speech API.

Prerequisites

Google Speech API

You need to acquire a google credential file to use the API to transcribe speech to text: https://cloud.google.com/speech-to-text Once you had the credential file copy it to for example ./keys folder and give a proper path in docker-compose.yaml for that.

Run with Docker

Set the GOOGLE_APPLICATION_CREDENTIALS inside docker-compose.yaml to point to the file contains your credential (make sure the mount section also mounts the proper directory if you put it elsewhere than ./keys)
Set the announcedIp to your computer assigned IP. (mac: ifconfig, linux / windows: ipconfig)

Start the server in docker

docker-compose up

Building the image takes time (5-20mins). When the image is built go to http://localhost:5959/rooms/yourRoom in two different tabs. Mute yourself one and speak. You should see the transcriptions in the tab and also should show who (which userId) did it.

Dev & Run

Server uses Gstreamer to make a pipeline converting encoded audio to wav, hence you need to install gstreamer on your local machine with good, bad, and ugly plugins. Once you have it, inside the server directory run npm run dev, which runs the server in developing mode.

Webpage uses a simple html5 page together with a stylesheet, and calls a bundled javascript contains all dependencies in one js file. To generate a new bundled index js you should run browserify index.js -o ../server/src/bundled-index.js inside webpage.

Contributions

Contributions are welcome. Here are some idea if you want to extend the functionalities:

Integrate any kind of speech to text API widely available (IBM watson, vosk, whatever...)
Better utilization of transcription in the front end (more fancy UI)
Add NLP module to respond voice commands
Make a separate react app for the client-side
Improve the underlying Gstreamer audio converter pipeline in the server in order to improve the overall transcription quality

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
server		server
webpage		webpage
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcriber Server using Mediasoup SFU for WebRTC Communication

Prerequisites

Google Speech API

Run with Docker

Dev & Run

Contributions

License

About

Releases

Packages

Languages

License

sarumjanuch/mediasoup-transcriber

Folders and files

Latest commit

History

Repository files navigation

Transcriber Server using Mediasoup SFU for WebRTC Communication

Prerequisites

Google Speech API

Run with Docker

Dev & Run

Contributions

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages