Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update axolotl image and other dependencies #28

Merged
merged 28 commits into from
Feb 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7feba35
Remove environment key from CI yaml
mwaskom Feb 6, 2024
247c994
Update base image spec to axolotl 0.4.0
mwaskom Feb 6, 2024
d1b236c
Update deepspeed config location
mwaskom Feb 6, 2024
1abc4f8
Remove redundant configuration flags from merge cmdline
mwaskom Feb 6, 2024
d09043a
Disable debug mode in codellama config
mwaskom Feb 6, 2024
f6f117b
Try re-enabling mistral flash attention
mwaskom Feb 6, 2024
9a5a848
Revert some of the CI training overrides
mwaskom Feb 6, 2024
6bc8e7e
Don't truncate data
mwaskom Feb 6, 2024
efd0984
Try a config without sample packing
mwaskom Feb 6, 2024
0ae5b47
Don't pad to sequence length
mwaskom Feb 6, 2024
1bb3630
Reinstate CI data truncation
mwaskom Feb 6, 2024
79b123c
Set base GPU config to use A100-40GB
mwaskom Feb 6, 2024
0b8735b
Remove sample packing and standardize batch / LR params for all models
mwaskom Feb 6, 2024
93ee98a
Standardize sequence_len for mistral
mwaskom Feb 6, 2024
b08062e
Use consistent fractional val_set_size
mwaskom Feb 6, 2024
89790fc
Disable quantization in llama config
mwaskom Feb 6, 2024
e855a8b
Fix CI val_set_size
mwaskom Feb 6, 2024
791a5af
Try simple torch optimizer
mwaskom Feb 7, 2024
0d34607
Try reverting deepspeed workaround
mwaskom Feb 7, 2024
d57e2b6
Fix type annotation
mwaskom Feb 7, 2024
e2a56be
Add a step to assert that the evaluation loss is reasonable
mwaskom Feb 7, 2024
c6ba1ab
Fix run name
mwaskom Feb 7, 2024
5bfe720
Improve results table extraction
mwaskom Feb 7, 2024
9db01a4
Fix direction of loss assertion
mwaskom Feb 7, 2024
e963fed
Don't call the remote data my_data
mwaskom Feb 7, 2024
50912c4
Remove huggingface secret (it's not needed for thse models)
mwaskom Feb 7, 2024
524a16f
Bump huggingface util pins
mwaskom Feb 7, 2024
b289ced
Update README
mwaskom Feb 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions .github/workflows/ci-cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on: pull_request

jobs:
test:
environment: CI
name: Test
runs-on: ubuntu-latest
strategy:
Expand All @@ -28,7 +27,7 @@ jobs:
- name: Install Modal
run: |
python -m pip install --upgrade pip
pip install modal pyyaml
pip install modal pyyaml pandas

- name: Prep config and data for CI
run: |
Expand All @@ -39,4 +38,8 @@ jobs:

- name: Run training job on Modal
run: |
GPU_MEM=40 modal run src.train --config=config/${{ matrix.config }}.yml --data=data/sqlqa.jsonl
modal run src.train --config=config/${{ matrix.config }}.yml --data=data/sqlqa.jsonl

- name: Check training results
run: |
python ci/check_loss.py
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,7 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/


# Local file written by the training script
.last_run_name
31 changes: 15 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ cd llm-finetuning
```
3. Launch a training job:
```bash
modal run --detach src.train --config=config/codellama.yml --data=data/sqlqa.jsonl
modal run --detach src.train --config=config/mistral.yml --data=data/sqlqa.jsonl
```

4. Try the model from a completed training run. You can select a folder via `modal volume ls example-runs-vol`, and then specify the training folder with the `--run-folder` flag (something like `/runs/axo-2023-11-24-17-26-66e8`) for inference:
Expand All @@ -48,7 +48,7 @@ modal run --detach src.train --config=config/codellama.yml --data=data/sqlqa.jso
modal run -q src.inference --run-folder /runs/<run_tag>
```

The default configuration fine-tunes CodeLlama Instruct 7B on a text-to-SQL dataset for five epochs (takes a few minutes) as a proof of concept. It uses DeepSpeed ZeRO-3 to shard the model state across 2 A100s. Inference on the fine-tuned model displays conformity to the output structure (`[SQL] ... [/SQL]`). To achieve better results, you would need to use more data! Refer to the full development section below.
Our quickstart example trains a 7B model on a text-to-SQL dataset as a proof of concept (it takes just a few minutes). It uses DeepSpeed ZeRO-3 to shard the model state across 2 A100s. Inference on the fine-tuned model displays conformity to the output structure (`[SQL] ... [/SQL]`). To achieve better results, you would need to use more data! Refer to the full development section below.

5. (Optional) Launch the GUI for easy observability of training status.

Expand Down Expand Up @@ -76,18 +76,18 @@ The rest of the code are helpers for _calling_ these three functions. There are

### Config

You can `example_configs` for quick start with different models. We recommend duplicating one to `src/config.yml` and modifying as you need. See an overview of Axolotl's config options [here](https://github.com/OpenAccess-AI-Collective/axolotl#config). The most important options to consider are:
You can view some example configurations in `config` for a quick start with different models. See an overview of Axolotl's config options [here](https://github.com/OpenAccess-AI-Collective/axolotl#config). The most important options to consider are:

**Model**
```yaml
base_model: codellama/CodeLlama-7b-Instruct-hf
base_model: mistralai/Mistral-7B-v0.1
```

**Dataset** (by default we upload a local .jsonl file from the `src` folder, but you can see all dataset options [here](https://github.com/OpenAccess-AI-Collective/axolotl#dataset))
**Dataset** (You can see all dataset options [here](https://github.com/OpenAccess-AI-Collective/axolotl#dataset))
```yaml
datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: my_data.jsonl
- path: data.jsonl
ds_type: json
type:
# JSONL file contains question, context, answer fields per line.
Expand All @@ -104,31 +104,31 @@ datasets:

**LoRA**
```yaml
adapter: lora # for qlora, or leave blank for full finetune
adapter: lora # for qlora, or leave blank for full finetune (requires much more GPU memory!)
lora_r: 16
lora_alpha: 32 # alpha = 2 x rank is a good rule of thumb.
lora_alpha: 32 # alpha = 2 x rank is a good rule of thumb.
lora_dropout: 0.05
lora_target_linear: true # target all linear layers
lora_target_linear: true # target all linear layers
```

### Custom Dataset

Axolotl supports many dataset formats ([see more](https://github.com/OpenAccess-AI-Collective/axolotl#dataset)). We recommend adding your custom dataset as a .jsonl file in the `src` folder and making the appropriate modifications to your config.
Axolotl supports many dataset formats ([see more](https://github.com/OpenAccess-AI-Collective/axolotl#dataset)). We recommend adding your custom dataset as a .jsonl file in the `data` folder and making the appropriate modifications to your config.

**Multi-GPU training**

We recommend [DeepSpeed](https://github.com/microsoft/DeepSpeed) for multi-GPU training, which is easy to set up. Axolotl provides several default deepspeed JSON [configurations](https://github.com/OpenAccess-AI-Collective/axolotl/tree/main/deepspeed) and Modal makes it easy to [attach multiple GPUs](https://modal.com/docs/guide/gpu#gpu-acceleration) of any type in code, so all you need to do is specify which of these configs you'd like to use.

In your `config.yml`:
```yaml
deepspeed: /root/axolotl/deepspeed/zero3.json
deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
```

In `train.py`:
```python
N_GPUS = 2
GPU_MEM = 80
GPU_CONFIG = modal.gpu.A100(count=N_GPUS, memory=GPU_MEM) # you can also change this to use A10Gs or T4s
GPU_MEM = 40
GPU_CONFIG = modal.gpu.A100(count=N_GPUS, memory=GPU_MEM) # you can also change this to use A10Gs or T4s
```

**Logging with Weights and Biases**
Expand Down Expand Up @@ -161,16 +161,15 @@ The script reads two local files containing the config information and the datas

When you make local changes to either your config or data, they will be used for your next training run.

The default configuration fine-tunes CodeLlama Instruct 7B to understand Modal documentation for five epochs as a proof of concept. It uses DeepSpeed ZeRO-3 to shard the model state across 2 A100s. To achieve better results, you would need to use more data and train for more epochs.

**Inference**

To try a model from a completed run, you can select a folder via `modal volume ls examples-runs-vol`, and then specify the training folder for inference:

```bash
modal run -q src.inference::inference_main --run-folder /runs/axo-2023-11-24-17-26-66e8
modal run -q src.inference::inference_main --run-folder=...
```

The training script writes the most recent run name to a local file, `.last_run_name`, for easy access.

## Using the GUI

Expand Down
31 changes: 31 additions & 0 deletions ci/check_loss.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from io import StringIO
import re
import sys

import pandas as pd

from modal import Volume


if __name__ == "__main__":

with open(".last_run_name", "r") as f:
run_name = f.read().strip()

vol = Volume.lookup("example-runs-vol")
contents = b""
for chunk in vol.read_file(f"{run_name}/lora-out/README.md"):
contents += chunk

m = re.search(r"### Training results\n\n(.+?)#", contents.decode(), flags=re.DOTALL)
if m is None:
sys.exit("Could not parse training results from model card")
else:
results_text = m.group(1).strip().replace(" ", "")

results = pd.read_table(StringIO(results_text), sep="|")
train_loss = float(results["TrainingLoss"].iloc[-1])
val_loss = float(results["ValidationLoss"].iloc[-1])

print(f"Loss: {train_loss:.2f} (training), {val_loss:.2f} (validation)")
sys.exit(val_loss > 0.25) # Arbitrary threshold
15 changes: 7 additions & 8 deletions ci/prep_for_ci.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,19 @@
@click.option("--config")
@click.option("--data")
def main(config: str, data: str):
"""Set the config for lighter-weight training and truncate the dataset."""
"""Set the config to train for only one epoch and truncate the dataset."""
train_set_size = 1000
val_set_size = 64
with open(config) as f:
cfg = yaml.safe_load(f.read())
cfg["sequence_len"] = 1024
cfg["val_set_size"] = 100
cfg["eval_batch_size"] = 2
cfg["micro_batch_size"] = 2
cfg["num_epochs"] = 2
cfg.pop("eval_steps", None)
cfg["val_set_size"] = val_set_size
cfg["num_epochs"] = 1
cfg.pop("eval_steps", None) # Evaluate once at the end of the epoch
with open(config, "w") as f:
yaml.dump(cfg, f)

with open(data) as f:
data_truncated = f.readlines()[:1000]
data_truncated = f.readlines()[: train_set_size + val_set_size]
with open(data, "w") as f:
f.writelines(data_truncated)

Expand Down
20 changes: 10 additions & 10 deletions config/codellama.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ strict: false

datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: my_data.jsonl
- path: data.jsonl
ds_type: json
type:
# JSONL file contains question, context, answer fields per line.
Expand All @@ -28,9 +28,9 @@ val_set_size: 0.05
output_dir: ./lora-out

sequence_len: 4096
sample_packing: true
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: true
pad_to_sequence_len: false

adapter: lora
lora_model_dir:
Expand All @@ -46,15 +46,15 @@ wandb_watch:
wandb_run_id:

gradient_accumulation_steps: 1
micro_batch_size: 16
num_epochs: 5
optimizer: adamw_bnb_8bit
micro_batch_size: 32
num_epochs: 4
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0002
learning_rate: 0.0001

train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
fp16: false
tf32: false

Expand All @@ -70,8 +70,8 @@ flash_attention: true
warmup_steps: 10
eval_steps: 0.05
save_steps:
debug: True
deepspeed: /root/axolotl/deepspeed/zero3.json
debug:
deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
weight_decay: 0.0
fsdp:
fsdp_config:
Expand Down
21 changes: 10 additions & 11 deletions config/llama-2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: true
load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: my_data.jsonl
- path: data.jsonl
ds_type: json
type:
# JSONL file contains question, context, answer fields per line.
Expand All @@ -28,14 +28,14 @@ val_set_size: 0.05
output_dir: ./lora-out

sequence_len: 4096
sample_packing: true
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: true
pad_to_sequence_len: false

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
Expand All @@ -44,14 +44,13 @@ wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
gradient_accumulation_steps: 1
micro_batch_size: 32
num_epochs: 4
optimizer: adamw_bnb_8bit
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0002
learning_rate: 0.0001

train_on_inputs: false
group_by_length: false
Expand Down
24 changes: 12 additions & 12 deletions config/mistral.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ strict: false

datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: my_data.jsonl
- path: data.jsonl
ds_type: json
type:
# JSONL file contains question, context, answer fields per line.
Expand All @@ -24,13 +24,13 @@ datasets:
{instruction} [/INST]

dataset_prepared_path:
val_set_size: 32
val_set_size: 0.05
output_dir: ./lora-out

sequence_len: 2048
sample_packing: true
sequence_len: 4096
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: true
pad_to_sequence_len: false

adapter: lora
lora_model_dir:
Expand All @@ -46,13 +46,13 @@ wandb_watch:
wandb_run_id:

gradient_accumulation_steps: 1
micro_batch_size: 16
num_epochs: 1
optimizer: adamw_bnb_8bit
micro_batch_size: 32
num_epochs: 4
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0002
learning_rate: 0.0001

bf16: true
bf16: auto
fp16: false
tf32: false
train_on_inputs: false
Expand All @@ -64,12 +64,12 @@ resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false
flash_attention: true

warmup_steps: 10
save_steps:
debug:
deepspeed: /root/axolotl/deepspeed/zero3.json
deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
weight_decay: 0.0
fsdp:
fsdp_config:
Expand Down
Loading