Transformer-based rap lyrics generator. Highly based on the Rapformer paper.
We use the dataset from Kaggle. To create the train and test datasets perform the following:
-
Download the files
artists-data.csv
andlyrics-data.csv
from Kaggle and place them indata/
folder in the project root. -
Run the following to generate files which will contain train and test examples:
python3 data.py
This will create 4 files: pretrain_[x|y].txt
and finetune_[x|y]
in the data/
folder. After this step the datasets are available by using LyricsDatasetProvider
and LyricsDataset
from dataset.py
.
After that run the following to reduce dataset sizes and the number of distinct tokens.
python3 filter.py
This will override the pretrain_[x|y]
and finetune_[x|y]
files.
To use Wandb
pip install wandb
wandb login
and paste your API key.
Running train.py
utilizes params, to check available run:
python train.py --help
and to train model with chosen params run:
python train.py [params]
alternatively use training script to easily change previously used parameters
./run_train.sh
Run infer.py
with appropriate parameters. This will generate results.txt
file.
Run rhyme_enhancement.py
. This assumes that there exists the file named results.txt
which contains generated examples. It prints the rhyme-enhanced examples into the screen.