Skip to content

Commit

Permalink
typos and minor fixes (#2508)
Browse files Browse the repository at this point in the history
* Update tacotron1-2.md

* Update README.md

* Update Tutorial_2_train_your_first_TTS_model.ipynb

* Update synthesizer.py

There is no arg called --speaker_name

* Update formatting_your_dataset.md

* Update AnalyzeDataset.ipynb

* Update AnalyzeDataset.ipynb

* Update AnalyzeDataset.ipynb

* Update finetuning.md

* Update train_yourtts.py

* Update train_yourtts.py

* Update train_yourtts.py

* Update finetuning.md
  • Loading branch information
prakharpbuf authored Apr 26, 2023
1 parent 2071088 commit c1875f6
Show file tree
Hide file tree
Showing 8 changed files with 18 additions and 18 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,7 @@ tts.tts_to_file(text="This is a test.", file_path=OUTPUT_PATH, emotion="Happy",
#### Multi-speaker Models
- List the available speakers and choose as <speaker_id> among them:
- List the available speakers and choose a <speaker_id> among them:
```
$ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs
Expand Down
4 changes: 2 additions & 2 deletions TTS/utils/synthesizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -269,8 +269,8 @@ def tts(

elif not speaker_name and not speaker_wav:
raise ValueError(
" [!] Look like you use a multi-speaker model. "
"You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model."
" [!] Looks like you are using a multi-speaker model. "
"You need to define either a `speaker_idx` or a `speaker_wav` to use a multi-speaker model."
)
else:
speaker_embedding = None
Expand Down
6 changes: 3 additions & 3 deletions docs/source/finetuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ them and fine-tune it for your own dataset. This will help you in two main ways:
Since a pre-trained model has already learned features that are relevant for the task, it will converge faster on
a new dataset. This will reduce the cost of training and let you experiment faster.

2. Better resutls with small datasets
2. Better results with small datasets

Deep learning models are data hungry and they give better performance with more data. However, it is not always
possible to have this abundance, especially in specific domains. For instance, the LJSpeech dataset, that we released most of
our English models with, is almost 24 hours long. It takes weeks to record this amount of data with
the help of a voice actor.

Fine-tuning comes to the rescue in this case. You can take one of our pre-trained models and fine-tune it on your own
speech dataset and achive reasonable results with only a couple of hours of data.
speech dataset and achieve reasonable results with only a couple of hours of data.

However, note that, fine-tuning does not ensure great results. The model performance is still depends on the
{ref}`dataset quality <what_makes_a_good_dataset>` and the hyper-parameters you choose for fine-tuning. Therefore,
Expand All @@ -35,7 +35,7 @@ them and fine-tune it for your own dataset. This will help you in two main ways:

2. Choose the model you want to fine-tune.

You can list the availabe models in the command line with
You can list the available models in the command line with

```bash
tts --list_models
Expand Down
2 changes: 1 addition & 1 deletion docs/source/formatting_your_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,4 +125,4 @@ See `TTS.tts.datasets.TTSDataset`, a generic `Dataset` implementation for the `t
See `TTS.vocoder.datasets.*`, for different `Dataset` implementations for the `vocoder` models.

See `TTS.utils.audio.AudioProcessor` that includes all the audio processing and feature extraction functions used in a
`Dataset` implementation. Feel free to add things as you need.passed
`Dataset` implementation. Feel free to add things as you need.
2 changes: 1 addition & 1 deletion docs/source/models/tacotron1-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Vanilla Tacotron models are slow at inference due to the auto-regressive* nature

Tacotron also uses a Prenet module with Dropout that projects the model’s previous output before feeding it to the decoder again. The paper and most of the implementations use the Dropout layer even in inference and they report the attention fails or the voice quality degrades otherwise. But the issue with that, you get a slightly different output speech every time you run the model.

Tsraining the attention is notoriously problematic in Tacoron models. Especially, in inference, for some input sequences, the alignment fails and causes the model to produce unexpected results. There are many different methods proposed to improve the attention.
Training the attention is notoriously problematic in Tacoron models. Especially, in inference, for some input sequences, the alignment fails and causes the model to produce unexpected results. There are many different methods proposed to improve the attention.

After hundreds of experiments, @ 🐸TTS we suggest Double Decoder Consistency that leads to the most robust model performance.

Expand Down
2 changes: 1 addition & 1 deletion notebooks/Tutorial_2_train_your_first_TTS_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
"\n",
"### **First things first**: we need some data.\n",
"\n",
"We're training a Text-to-Speech model, so we need some _text_ and we need some _speech_. Specificially, we want _transcribed speech_. The speech must be divided into audio clips and each clip needs transcription. More details about data requirements such as recording characteristics, background noise abd vocabulary coverage can be found in the [🐸TTS documentation](https://tts.readthedocs.io/en/latest/formatting_your_dataset.html).\n",
"We're training a Text-to-Speech model, so we need some _text_ and we need some _speech_. Specificially, we want _transcribed speech_. The speech must be divided into audio clips and each clip needs transcription. More details about data requirements such as recording characteristics, background noise and vocabulary coverage can be found in the [🐸TTS documentation](https://tts.readthedocs.io/en/latest/formatting_your_dataset.html).\n",
"\n",
"If you have a single audio file and you need to **split** it into clips. It is also important to use a lossless audio file format to prevent compression artifacts. We recommend using **wav** file format.\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions notebooks/dataset_analysis/AnalyzeDataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
"source": [
"NUM_PROC = 8\n",
"DATASET_CONFIG = BaseDatasetConfig(\n",
" name=\"ljspeech\", meta_file_train=\"metadata.csv\", path=\"/absolute/path/to/your/dataset/\"\n",
" formatter=\"ljspeech\", meta_file_train=\"metadata.csv\", path=\"/absolute/path/to/your/dataset/\"\n",
")"
]
},
Expand All @@ -64,7 +64,7 @@
" cols = line.split(\"|\")\n",
" wav_file = os.path.join(root_path, \"wavs\", cols[0] + \".wav\") \n",
" text = cols[1]\n",
" items.append({\"text\": text, \"audio_file\": wav_file, \"speaker_name\": speaker_name})\n",
" items.append({\"text\": text, \"audio_file\": wav_file, \"speaker_name\": speaker_name, \"root_path\": root_path})\n",
" return items"
]
},
Expand Down
14 changes: 7 additions & 7 deletions recipes/vctk/yourtts/train_yourtts.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
# If you want to do transfer learning and speedup your training you can set here the path to the original YourTTS model
RESTORE_PATH = None # "/root/.local/share/tts/tts_models--multilingual--multi-dataset--your_tts/model_file.pth"

# This paramter is usefull to debug, it skips the training epochs and just do the evaluation and produce the test sentences
# This paramter is useful to debug, it skips the training epochs and just do the evaluation and produce the test sentences
SKIP_TRAIN_EPOCH = False

# Set here the batch size to be used in training and evaluation
Expand Down Expand Up @@ -78,7 +78,7 @@
], # Ignore the test speakers to full replicate the paper experiment
)

# Add here all datasets configs, in our case we just want to train with the VCTK dataset then we need to add just VCTK. Note: If you want to added new datasets just added they here and it will automatically compute the speaker embeddings (d-vectors) for this new dataset :)
# Add here all datasets configs, in our case we just want to train with the VCTK dataset then we need to add just VCTK. Note: If you want to add new datasets, just add them here and it will automatically compute the speaker embeddings (d-vectors) for this new dataset :)
DATASETS_CONFIG_LIST = [vctk_config]

### Extract speaker embeddings
Expand Down Expand Up @@ -123,23 +123,23 @@
num_mels=80,
)

# Init VITSArgs setting the arguments that is needed for the YourTTS model
# Init VITSArgs setting the arguments that are needed for the YourTTS model
model_args = VitsArgs(
d_vector_file=D_VECTOR_FILES,
use_d_vector_file=True,
d_vector_dim=512,
num_layers_text_encoder=10,
speaker_encoder_model_path=SPEAKER_ENCODER_CHECKPOINT_PATH,
speaker_encoder_config_path=SPEAKER_ENCODER_CONFIG_PATH,
resblock_type_decoder="2", # On the paper, we accidentally trained the YourTTS using ResNet blocks type 2, if you like you can use the ResNet blocks type 1 like the VITS model
# Usefull parameters to enable the Speaker Consistency Loss (SCL) discribed in the paper
resblock_type_decoder="2", # In the paper, we accidentally trained the YourTTS using ResNet blocks type 2, if you like you can use the ResNet blocks type 1 like the VITS model
# Useful parameters to enable the Speaker Consistency Loss (SCL) described in the paper
# use_speaker_encoder_as_loss=True,
# Usefull parameters to the enable multilingual training
# Useful parameters to enable multilingual training
# use_language_embedding=True,
# embedded_language_dim=4,
)

# General training config, here you can change the batch size and others usefull parameters
# General training config, here you can change the batch size and others useful parameters
config = VitsConfig(
output_path=OUT_PATH,
model_args=model_args,
Expand Down

0 comments on commit c1875f6

Please sign in to comment.