Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CLI handling of boolean arguments #38

Merged
merged 4 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
- 📣 ⓍTTSv2 is here with 16 languages and better performance across the board.
- 📣 ⓍTTS fine-tuning code is out. Check the [example recipes](https://github.com/idiap/coqui-ai-TTS/tree/dev/recipes/ljspeech).
- 📣 ⓍTTS can now stream with <200ms latency.
- 📣 ⓍTTS, our production TTS model that can speak 13 languages, is released [Blog Post](https://coqui.ai/blog/tts/open_xtts), [Demo](https://huggingface.co/spaces/coqui/xtts), [Docs](https://coqui-tts.readthedocs.io/en/dev/models/xtts.html)
- 📣 [🐶Bark](https://github.com/suno-ai/bark) is now available for inference with unconstrained voice cloning. [Docs](https://coqui-tts.readthedocs.io/en/dev/models/bark.html)
- 📣 ⓍTTS, our production TTS model that can speak 13 languages, is released [Blog Post](https://coqui.ai/blog/tts/open_xtts), [Demo](https://huggingface.co/spaces/coqui/xtts), [Docs](https://coqui-tts.readthedocs.io/en/latest/models/xtts.html)
- 📣 [🐶Bark](https://github.com/suno-ai/bark) is now available for inference with unconstrained voice cloning. [Docs](https://coqui-tts.readthedocs.io/en/latest/models/bark.html)
- 📣 You can use [~1100 Fairseq models](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) with 🐸TTS.
- 📣 🐸TTS now supports 🐢Tortoise with faster inference. [Docs](https://coqui-tts.readthedocs.io/en/dev/models/tortoise.html)
- 📣 🐸TTS now supports 🐢Tortoise with faster inference. [Docs](https://coqui-tts.readthedocs.io/en/latest/models/tortoise.html)

<div align="center">
<img src="https://static.scarf.sh/a.png?x-pxid=cf317fe7-2188-4721-bc01-124bb5d5dbb2" />
Expand Down Expand Up @@ -55,6 +55,10 @@ Please use our dedicated channels for questions and discussion. Help is much mor
[discord]: https://discord.gg/5eXr5seRrv
[Tutorials and Examples]: https://github.com/coqui-ai/TTS/wiki/TTS-Notebooks-and-Tutorials

The [issues](https://github.com/coqui-ai/TTS/issues) and
[discussions](https://github.com/coqui-ai/TTS/discussions) in the original
repository are also still a useful source of information.


## 🔗 Links and Resources
| Type | Links |
Expand Down Expand Up @@ -143,6 +147,7 @@ If you plan to code or train models, clone 🐸TTS and install it locally.

```bash
git clone https://github.com/idiap/coqui-ai-TTS
cd coqui-ai-TTS
pip install -e .
```

Expand Down
4 changes: 2 additions & 2 deletions TTS/bin/compute_attention_masks.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
--data_path /root/LJSpeech-1.1/
--batch_size 32
--dataset ljspeech
--use_cuda True
--use_cuda
""",
formatter_class=RawTextHelpFormatter,
)
Expand All @@ -62,7 +62,7 @@
help="Dataset metafile inclusing file paths with transcripts.",
)
parser.add_argument("--data_path", type=str, default="", help="Defines the data path. It overwrites config.json.")
parser.add_argument("--use_cuda", type=bool, default=False, help="enable/disable cuda.")
parser.add_argument("--use_cuda", action=argparse.BooleanOptionalAction, default=False, help="enable/disable cuda.")

parser.add_argument(
"--batch_size", default=16, type=int, help="Batch size for the model. Use batch_size=1 if you have no CUDA."
Expand Down
2 changes: 1 addition & 1 deletion TTS/bin/compute_embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def compute_embeddings(
default=False,
action="store_true",
)
parser.add_argument("--disable_cuda", type=bool, help="Flag to disable cuda.", default=False)
parser.add_argument("--disable_cuda", action="store_true", help="Flag to disable cuda.", default=False)
parser.add_argument("--no_eval", help="Do not compute eval?. Default False", default=False, action="store_true")
parser.add_argument(
"--formatter_name",
Expand Down
4 changes: 2 additions & 2 deletions TTS/bin/eval_encoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ def compute_encoder_accuracy(dataset_items, encoder_manager):
type=str,
help="Path to dataset config file.",
)
parser.add_argument("--use_cuda", type=bool, help="flag to set cuda.", default=True)
parser.add_argument("--eval", type=bool, help="compute eval.", default=True)
parser.add_argument("--use_cuda", action=argparse.BooleanOptionalAction, help="flag to set cuda.", default=True)
parser.add_argument("--eval", action=argparse.BooleanOptionalAction, help="compute eval.", default=True)

args = parser.parse_args()

Expand Down
2 changes: 1 addition & 1 deletion TTS/bin/extract_tts_spectrograms.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ def main(args): # pylint: disable=redefined-outer-name
parser.add_argument("--debug", default=False, action="store_true", help="Save audio files for debug")
parser.add_argument("--save_audio", default=False, action="store_true", help="Save audio files")
parser.add_argument("--quantize_bits", type=int, default=0, help="Save quantized audio files if non-zero")
parser.add_argument("--eval", type=bool, help="compute eval.", default=True)
parser.add_argument("--eval", action=argparse.BooleanOptionalAction, help="compute eval.", default=True)
args = parser.parse_args()

c = load_config(args.config_path)
Expand Down
10 changes: 5 additions & 5 deletions TTS/bin/remove_silence_using_vad.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def preprocess_audios():
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

parser = argparse.ArgumentParser(
description="python TTS/bin/remove_silence_using_vad.py -i=VCTK-Corpus/ -o=VCTK-Corpus-removed-silence/ -g=wav48_silence_trimmed/*/*_mic1.flac --trim_just_beginning_and_end True"
description="python TTS/bin/remove_silence_using_vad.py -i=VCTK-Corpus/ -o=VCTK-Corpus-removed-silence/ -g=wav48_silence_trimmed/*/*_mic1.flac --trim_just_beginning_and_end"
)
parser.add_argument("-i", "--input_dir", type=str, help="Dataset root dir", required=True)
parser.add_argument("-o", "--output_dir", type=str, help="Output Dataset dir", default="")
Expand All @@ -95,20 +95,20 @@ def preprocess_audios():
parser.add_argument(
"-t",
"--trim_just_beginning_and_end",
type=bool,
action=argparse.BooleanOptionalAction,
default=True,
help="If True this script will trim just the beginning and end nonspeech parts. If False all nonspeech parts will be trim. Default True",
help="If True this script will trim just the beginning and end nonspeech parts. If False all nonspeech parts will be trimmed.",
)
parser.add_argument(
"-c",
"--use_cuda",
type=bool,
action=argparse.BooleanOptionalAction,
default=False,
help="If True use cuda",
)
parser.add_argument(
"--use_onnx",
type=bool,
action=argparse.BooleanOptionalAction,
default=False,
help="If True use onnx",
)
Expand Down
57 changes: 19 additions & 38 deletions TTS/bin/synthesize.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""Command line interface."""

import argparse
import contextlib
Expand Down Expand Up @@ -136,30 +137,16 @@
"""


def str2bool(v):
if isinstance(v, bool):
return v
if v.lower() in ("yes", "true", "t", "y", "1"):
return True
if v.lower() in ("no", "false", "f", "n", "0"):
return False
raise argparse.ArgumentTypeError("Boolean value expected.")


def main():
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())

def parse_args() -> argparse.Namespace:
"""Parse arguments."""
parser = argparse.ArgumentParser(
description=description.replace(" ```\n", ""),
formatter_class=RawTextHelpFormatter,
)

parser.add_argument(
"--list_models",
type=str2bool,
nargs="?",
const=True,
default=False,
action="store_true",
help="list available pre-trained TTS and vocoder models.",
)

Expand Down Expand Up @@ -207,7 +194,7 @@ def main():
default="tts_output.wav",
help="Output wav file path.",
)
parser.add_argument("--use_cuda", type=bool, help="Run model on CUDA.", default=False)
parser.add_argument("--use_cuda", action="store_true", help="Run model on CUDA.")
parser.add_argument("--device", type=str, help="Device to run model on.", default="cpu")
parser.add_argument(
"--vocoder_path",
Expand All @@ -226,10 +213,7 @@ def main():
parser.add_argument(
"--pipe_out",
help="stdout the generated TTS wav file for shell pipe.",
type=str2bool,
nargs="?",
const=True,
default=False,
action="store_true",
)

# args for multi-speaker synthesis
Expand Down Expand Up @@ -261,25 +245,18 @@ def main():
parser.add_argument(
"--list_speaker_idxs",
help="List available speaker ids for the defined multi-speaker model.",
type=str2bool,
nargs="?",
const=True,
default=False,
action="store_true",
)
parser.add_argument(
"--list_language_idxs",
help="List available language ids for the defined multi-lingual model.",
type=str2bool,
nargs="?",
const=True,
default=False,
action="store_true",
)
# aux args
parser.add_argument(
"--save_spectogram",
type=bool,
help="If true save raw spectogram for further (vocoder) processing in out_path.",
default=False,
action="store_true",
help="Save raw spectogram for further (vocoder) processing in out_path.",
)
parser.add_argument(
"--reference_wav",
Expand All @@ -295,8 +272,8 @@ def main():
)
parser.add_argument(
"--progress_bar",
type=str2bool,
help="If true shows a progress bar for the model download. Defaults to True",
action=argparse.BooleanOptionalAction,
help="Show a progress bar for the model download.",
default=True,
)

Expand Down Expand Up @@ -337,19 +314,23 @@ def main():
]
if not any(check_args):
parser.parse_args(["-h"])
return args


def main():
setup_logger("TTS", level=logging.INFO, screen=True, formatter=ConsoleFormatter())
args = parse_args()

pipe_out = sys.stdout if args.pipe_out else None

with contextlib.redirect_stdout(None if args.pipe_out else sys.stdout):
# Late-import to make things load faster
from TTS.api import TTS
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer

# load model manager
path = Path(__file__).parent / "../.models.json"
manager = ModelManager(path, progress_bar=args.progress_bar)
api = TTS()

tts_path = None
tts_config_path = None
Expand Down
2 changes: 1 addition & 1 deletion TTS/encoder/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@ To run the code, you need to follow the same flow as in TTS.

- Define 'config.json' for your needs. Note that, audio parameters should match your TTS model.
- Example training call ```python speaker_encoder/train.py --config_path speaker_encoder/config.json --data_path ~/Data/Libri-TTS/train-clean-360```
- Generate embedding vectors ```python speaker_encoder/compute_embeddings.py --use_cuda true /model/path/best_model.pth model/config/path/config.json dataset/path/ output_path``` . This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files.
- Generate embedding vectors ```python speaker_encoder/compute_embeddings.py --use_cuda /model/path/best_model.pth model/config/path/config.json dataset/path/ output_path``` . This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files.
- Watch training on Tensorboard as in TTS
2 changes: 1 addition & 1 deletion TTS/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Run the server with the official models.
```python TTS/server/server.py --model_name tts_models/en/ljspeech/tacotron2-DCA --vocoder_name vocoder_models/en/ljspeech/multiband-melgan```

Run the server with the official models on a GPU.
```CUDA_VISIBLE_DEVICES="0" python TTS/server/server.py --model_name tts_models/en/ljspeech/tacotron2-DCA --vocoder_name vocoder_models/en/ljspeech/multiband-melgan --use_cuda True```
```CUDA_VISIBLE_DEVICES="0" python TTS/server/server.py --model_name tts_models/en/ljspeech/tacotron2-DCA --vocoder_name vocoder_models/en/ljspeech/multiband-melgan --use_cuda```

Run the server with a custom models.
```python TTS/server/server.py --tts_checkpoint /path/to/tts/model.pth --tts_config /path/to/tts/config.json --vocoder_checkpoint /path/to/vocoder/model.pth --vocoder_config /path/to/vocoder/config.json```
6 changes: 3 additions & 3 deletions TTS/tts/models/xtts.py
Original file line number Diff line number Diff line change
Expand Up @@ -698,12 +698,12 @@ def inference_stream(

def forward(self):
raise NotImplementedError(
"XTTS has a dedicated trainer, please check the XTTS docs: https://coqui-tts.readthedocs.io/en/dev/models/xtts.html#training"
"XTTS has a dedicated trainer, please check the XTTS docs: https://coqui-tts.readthedocs.io/en/latest/models/xtts.html#training"
)

def eval_step(self):
raise NotImplementedError(
"XTTS has a dedicated trainer, please check the XTTS docs: https://coqui-tts.readthedocs.io/en/dev/models/xtts.html#training"
"XTTS has a dedicated trainer, please check the XTTS docs: https://coqui-tts.readthedocs.io/en/latest/models/xtts.html#training"
)

@staticmethod
Expand Down Expand Up @@ -792,5 +792,5 @@ def load_checkpoint(

def train_step(self):
raise NotImplementedError(
"XTTS has a dedicated trainer, please check the XTTS docs: https://coqui-tts.readthedocs.io/en/dev/models/xtts.html#training"
"XTTS has a dedicated trainer, please check the XTTS docs: https://coqui-tts.readthedocs.io/en/latest/models/xtts.html#training"
)
2 changes: 1 addition & 1 deletion TTS/utils/generic_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ def format_aux_input(def_args: Dict, kwargs: Dict) -> Dict:


def get_timestamp() -> str:
return datetime.now().strftime("%y%m%d-%H%M%S")
return datetime.datetime.now().strftime("%y%m%d-%H%M%S")


class ConsoleFormatter(logging.Formatter):
Expand Down
4 changes: 2 additions & 2 deletions docs/source/docker_images.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ For the GPU version, you need to have the latest NVIDIA drivers installed.
With `nvidia-smi` you can check the CUDA version supported, it must be >= 11.8

```bash
docker run --rm --gpus all -v ~/tts-output:/root/tts-output ghcr.io/coqui-ai/tts --text "Hello." --out_path /root/tts-output/hello.wav --use_cuda true
docker run --rm --gpus all -v ~/tts-output:/root/tts-output ghcr.io/coqui-ai/tts --text "Hello." --out_path /root/tts-output/hello.wav --use_cuda
```

## Start a server
Expand All @@ -50,7 +50,7 @@ python3 TTS/server/server.py --model_name tts_models/en/vctk/vits
```bash
docker run --rm -it -p 5002:5002 --gpus all --entrypoint /bin/bash ghcr.io/coqui-ai/tts
python3 TTS/server/server.py --list_models #To get the list of available models
python3 TTS/server/server.py --model_name tts_models/en/vctk/vits --use_cuda true
python3 TTS/server/server.py --model_name tts_models/en/vctk/vits --use_cuda
```

Click [there](http://[::1]:5002/) and have fun with the server!
6 changes: 2 additions & 4 deletions docs/source/models/bark.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,14 +69,12 @@ tts --model_name tts_models/multilingual/multi-dataset/bark \
--text "This is an example." \
--out_path "output.wav" \
--voice_dir bark_voices/ \
--speaker_idx "ljspeech" \
--progress_bar True
--speaker_idx "ljspeech"

# Random voice generation
tts --model_name tts_models/multilingual/multi-dataset/bark \
--text "This is an example." \
--out_path "output.wav" \
--progress_bar True
--out_path "output.wav"
```


Expand Down
6 changes: 2 additions & 4 deletions docs/source/models/tortoise.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,12 @@ tts --model_name tts_models/en/multi-dataset/tortoise-v2 \
--text "This is an example." \
--out_path "output.wav" \
--voice_dir path/to/tortoise/voices/dir/ \
--speaker_idx "lj" \
--progress_bar True
--speaker_idx "lj"

# Random voice generation
tts --model_name tts_models/en/multi-dataset/tortoise-v2 \
--text "This is an example." \
--out_path "output.wav" \
--progress_bar True
--out_path "output.wav"
```


Expand Down
8 changes: 4 additions & 4 deletions docs/source/models/xtts.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ You can do inference using one of the available speakers using the following com
--text "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent." \
--speaker_idx "Ana Florence" \
--language_idx en \
--use_cuda true
--use_cuda
```

##### Clone a voice
Expand All @@ -85,7 +85,7 @@ You can clone a speaker voice using a single or multiple references:
--text "Bugün okula gitmek istemiyorum." \
--speaker_wav /path/to/target/speaker.wav \
--language_idx tr \
--use_cuda true
--use_cuda
```

###### Multiple references
Expand All @@ -94,7 +94,7 @@ You can clone a speaker voice using a single or multiple references:
--text "Bugün okula gitmek istemiyorum." \
--speaker_wav /path/to/target/speaker.wav /path/to/target/speaker_2.wav /path/to/target/speaker_3.wav \
--language_idx tr \
--use_cuda true
--use_cuda
```
or for all wav files in a directory you can use:

Expand All @@ -103,7 +103,7 @@ or for all wav files in a directory you can use:
--text "Bugün okula gitmek istemiyorum." \
--speaker_wav /path/to/target/*.wav \
--language_idx tr \
--use_cuda true
--use_cuda
```

#### 🐸TTS API
Expand Down
2 changes: 1 addition & 1 deletion recipes/ljspeech/fast_pitch/train_fast_pitch.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@
model_path, config_path, _ = manager.download_model("tts_models/en/ljspeech/tacotron2-DCA")
# TODO: make compute_attention python callable
os.system(
f"python TTS/bin/compute_attention_masks.py --model_path {model_path} --config_path {config_path} --dataset ljspeech --dataset_metafile metadata.csv --data_path ./recipes/ljspeech/LJSpeech-1.1/ --use_cuda true"
f"python TTS/bin/compute_attention_masks.py --model_path {model_path} --config_path {config_path} --dataset ljspeech --dataset_metafile metadata.csv --data_path ./recipes/ljspeech/LJSpeech-1.1/ --use_cuda"
)

# INITIALIZE THE AUDIO PROCESSOR
Expand Down
2 changes: 1 addition & 1 deletion recipes/ljspeech/fast_speech/train_fast_speech.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
model_path, config_path, _ = manager.download_model("tts_models/en/ljspeech/tacotron2-DCA")
# TODO: make compute_attention python callable
os.system(
f"python TTS/bin/compute_attention_masks.py --model_path {model_path} --config_path {config_path} --dataset ljspeech --dataset_metafile metadata.csv --data_path ./recipes/ljspeech/LJSpeech-1.1/ --use_cuda true"
f"python TTS/bin/compute_attention_masks.py --model_path {model_path} --config_path {config_path} --dataset ljspeech --dataset_metafile metadata.csv --data_path ./recipes/ljspeech/LJSpeech-1.1/ --use_cuda"
)

# INITIALIZE THE AUDIO PROCESSOR
Expand Down
Loading