🐶 Bark with REST API and Voice Cloning

This repository contains code for:

text2speech synthesis by Suno.ai. This supports different speakers, languages, emotions & singing.
speaker generation aka voice cloning by gitmylo

All of this is wrapped into a convenient REST API with FAST_API

The repos also simplifies setup of the bark repos and clone-voice repos, because all models are automatically downloaded.

Disclaimer

This repository is a merge of the orignal bark repository and bark-voice-cloning-HuBert-quantizer The credit goes to the original authors. Like the original authors, I am also not responsible for any misuse of this repository. Use at your own risk, and please act responsibly.

Setup

Clone the repository.
(Optional) Create a virtual environment. With python -m venv venv and activate it with venv/Scripts/activate.
Install the requirements. pip install -r requirements.txt
Don't forget to install pytorch gpu version (with pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118)

Usage

Start the server by running the provided .bat file "start_server.bat" 2. or by using python bark/server.py --port 8009 make sure the python PYTHONPATH is set to the root of this repository.
To test the server, open http://localhost:8009/docs in your browser.

Then make post requests to the server with your favorite tool or library. Here are some examples to inference with a python client.

Note: The first time you start the server, it will download the models. This can take a while. If this fails, you can download the files manually or with the model_downloader.py script.

For text2speech synthesis

import requests
response = requests.post("http://localhost:8009/text2voice", params={ "text" : "please contribute", "speaker": "en_speaker_3"})

The response is a .wav file as bytes. You can save it with:

import librosa
from io import BytesIO

# convert to audio file
audio_file, sr = librosa.load(BytesIO(response.content))
# save to file
sf.write(save_file_path, audio_file, sr)

For speaker embedding generation

import requests
with open("myfile.wav", "rb") as f:
    audio = f.read()
response = requests.post("http://localhost:8009/create_speaker_embedding", params={ "speaker_name" : "my_new_speaker"}, files={"audio_file": audio})

The response is a .npz file as bytes. After the embedding was created it can be used in text2speech synthesis.

For voice2voice synthesis

import requests
with open("myfile.wav", "rb") as f:
    audio = f.read()
response = requests.post("http://localhost:8009/voice2voice", params={ "speaker_name" : "my_new_speaker"}, files={"audio_file": audio})

In this example it is assumed that previously a speaker with name "my_new_speaker" was created with the create_speaker_embedding endpoint.

Cloud environments

All settings relevant for execution are stored in settings.py and adjustable via environment variables. When run in docker or a cloud environment, just set the environment variables accordingly.

Contribute

any help with maintaining and extending the package is welcome. Feel free to open an issue or a pull request. ToDo: make inference faster by keeping models in memory

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
bark		bark
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bark_fastapi.PNG		bark_fastapi.PNG
model-card.md		model-card.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
start_server.bat		start_server.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐶 Bark with REST API and Voice Cloning

Disclaimer

Setup

Usage

For text2speech synthesis

For speaker embedding generation

For voice2voice synthesis

Cloud environments

Contribute

About

Releases

Packages

Languages

License

sauravgarg540/BarkVoiceCloneREST

Folders and files

Latest commit

History

Repository files navigation

🐶 Bark with REST API and Voice Cloning

Disclaimer

Setup

Usage

For text2speech synthesis

For speaker embedding generation

For voice2voice synthesis

Cloud environments

Contribute

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages