GPT2 Quickly

Build your own GPT2 quickly, without doing many useless work.

This project is base on 🤗 transformer. This tutorial show you how to train your own language(such as Chinese or Japanese) GPT2 model in a few code with Tensorflow 2.

You can try this project in colab right now.

Main file


├── configs
│   ├── test.py
│   └── train.py
├── build_tokenizer.py
├── predata.py
├── predict.py
└── train.py

Preparation

virtualenv

git clone [email protected]:mymusise/gpt2-quickly.git
cd gpt2-quickly
python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt

Install google/sentencepiece

see https://github.com/google/sentencepiece#installation

0x00. prepare your raw dataset

this is a example of raw dataset: raw.txt

0x01. Build vocab

python cut_words.py
python build_tokenizer.py

0x02. Tokenize

python predata.py --n_processes=2

0x03 Train

python train.py

0x04 Predict

python predict.py

0x05 Fine-Tune

ENV=FINETUNE python finetune.py

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
configs		configs
dataset/test		dataset/test
examples		examples
.gitignore		.gitignore
README.md		README.md
build_tokenizer.py		build_tokenizer.py
cut_words.py		cut_words.py
fast_attention.py		fast_attention.py
finetune.py		finetune.py
performer.py		performer.py
predata.py		predata.py
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT2 Quickly

Build your own GPT2 quickly, without doing many useless work.

Main file

Preparation

virtualenv

Install google/sentencepiece

0x00. prepare your raw dataset

0x01. Build vocab

0x02. Tokenize

0x03 Train

0x04 Predict

0x05 Fine-Tune

About

Releases 1

Packages

Languages

mymusise/gpt2-quickly

Folders and files

Latest commit

History

Repository files navigation

GPT2 Quickly

Build your own GPT2 quickly, without doing many useless work.

Main file

Preparation

virtualenv

Install google/sentencepiece

0x00. prepare your raw dataset

0x01. Build vocab

0x02. Tokenize

0x03 Train

0x04 Predict

0x05 Fine-Tune

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages