Skip to content

Commit

Permalink
Update axolotl image and other dependencies (#28)
Browse files Browse the repository at this point in the history
* Remove environment key from CI yaml

* Update base image spec to axolotl 0.4.0

* Update deepspeed config location

* Remove redundant configuration flags from merge cmdline

* Disable debug mode in codellama config

* Try re-enabling mistral flash attention

* Revert some of the CI training overrides

* Don't truncate data

* Try a config without sample packing

* Don't pad to sequence length

* Reinstate CI data truncation

* Set base GPU config to use A100-40GB

* Remove sample packing and standardize batch / LR params for all models

* Standardize sequence_len for mistral

* Use consistent fractional val_set_size

* Disable quantization in llama config

* Fix CI val_set_size

* Try simple torch optimizer

* Try reverting deepspeed workaround

* Fix type annotation

* Add a step to assert that the evaluation loss is reasonable

* Fix run name

* Improve results table extraction

* Fix direction of loss assertion

* Don't call the remote data my_data

* Remove huggingface secret (it's not needed for thse models)

* Bump huggingface util pins

* Update README
  • Loading branch information
mwaskom authored Feb 9, 2024
1 parent 3442b1f commit 62cfb65
Show file tree
Hide file tree
Showing 10 changed files with 116 additions and 89 deletions.
9 changes: 6 additions & 3 deletions .github/workflows/ci-cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on: pull_request

jobs:
test:
environment: CI
name: Test
runs-on: ubuntu-latest
strategy:
Expand All @@ -28,7 +27,7 @@ jobs:
- name: Install Modal
run: |
python -m pip install --upgrade pip
pip install modal pyyaml
pip install modal pyyaml pandas
- name: Prep config and data for CI
run: |
Expand All @@ -39,4 +38,8 @@ jobs:
- name: Run training job on Modal
run: |
GPU_MEM=40 modal run src.train --config=config/${{ matrix.config }}.yml --data=data/sqlqa.jsonl
modal run src.train --config=config/${{ matrix.config }}.yml --data=data/sqlqa.jsonl
- name: Check training results
run: |
python ci/check_loss.py
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,7 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/


# Local file written by the training script
.last_run_name
31 changes: 15 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ cd llm-finetuning
```
3. Launch a training job:
```bash
modal run --detach src.train --config=config/codellama.yml --data=data/sqlqa.jsonl
modal run --detach src.train --config=config/mistral.yml --data=data/sqlqa.jsonl
```

4. Try the model from a completed training run. You can select a folder via `modal volume ls example-runs-vol`, and then specify the training folder with the `--run-folder` flag (something like `/runs/axo-2023-11-24-17-26-66e8`) for inference:
Expand All @@ -48,7 +48,7 @@ modal run --detach src.train --config=config/codellama.yml --data=data/sqlqa.jso
modal run -q src.inference --run-folder /runs/<run_tag>
```

The default configuration fine-tunes CodeLlama Instruct 7B on a text-to-SQL dataset for five epochs (takes a few minutes) as a proof of concept. It uses DeepSpeed ZeRO-3 to shard the model state across 2 A100s. Inference on the fine-tuned model displays conformity to the output structure (`[SQL] ... [/SQL]`). To achieve better results, you would need to use more data! Refer to the full development section below.
Our quickstart example trains a 7B model on a text-to-SQL dataset as a proof of concept (it takes just a few minutes). It uses DeepSpeed ZeRO-3 to shard the model state across 2 A100s. Inference on the fine-tuned model displays conformity to the output structure (`[SQL] ... [/SQL]`). To achieve better results, you would need to use more data! Refer to the full development section below.

5. (Optional) Launch the GUI for easy observability of training status.

Expand Down Expand Up @@ -76,18 +76,18 @@ The rest of the code are helpers for _calling_ these three functions. There are

### Config

You can `example_configs` for quick start with different models. We recommend duplicating one to `src/config.yml` and modifying as you need. See an overview of Axolotl's config options [here](https://github.com/OpenAccess-AI-Collective/axolotl#config). The most important options to consider are:
You can view some example configurations in `config` for a quick start with different models. See an overview of Axolotl's config options [here](https://github.com/OpenAccess-AI-Collective/axolotl#config). The most important options to consider are:

**Model**
```yaml
base_model: codellama/CodeLlama-7b-Instruct-hf
base_model: mistralai/Mistral-7B-v0.1
```
**Dataset** (by default we upload a local .jsonl file from the `src` folder, but you can see all dataset options [here](https://github.com/OpenAccess-AI-Collective/axolotl#dataset))
**Dataset** (You can see all dataset options [here](https://github.com/OpenAccess-AI-Collective/axolotl#dataset))
```yaml
datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: my_data.jsonl
- path: data.jsonl
ds_type: json
type:
# JSONL file contains question, context, answer fields per line.
Expand All @@ -104,31 +104,31 @@ datasets:
**LoRA**
```yaml
adapter: lora # for qlora, or leave blank for full finetune
adapter: lora # for qlora, or leave blank for full finetune (requires much more GPU memory!)
lora_r: 16
lora_alpha: 32 # alpha = 2 x rank is a good rule of thumb.
lora_alpha: 32 # alpha = 2 x rank is a good rule of thumb.
lora_dropout: 0.05
lora_target_linear: true # target all linear layers
lora_target_linear: true # target all linear layers
```
### Custom Dataset
Axolotl supports many dataset formats ([see more](https://github.com/OpenAccess-AI-Collective/axolotl#dataset)). We recommend adding your custom dataset as a .jsonl file in the `src` folder and making the appropriate modifications to your config.
Axolotl supports many dataset formats ([see more](https://github.com/OpenAccess-AI-Collective/axolotl#dataset)). We recommend adding your custom dataset as a .jsonl file in the `data` folder and making the appropriate modifications to your config.

**Multi-GPU training**

We recommend [DeepSpeed](https://github.com/microsoft/DeepSpeed) for multi-GPU training, which is easy to set up. Axolotl provides several default deepspeed JSON [configurations](https://github.com/OpenAccess-AI-Collective/axolotl/tree/main/deepspeed) and Modal makes it easy to [attach multiple GPUs](https://modal.com/docs/guide/gpu#gpu-acceleration) of any type in code, so all you need to do is specify which of these configs you'd like to use.

In your `config.yml`:
```yaml
deepspeed: /root/axolotl/deepspeed/zero3.json
deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
```

In `train.py`:
```python
N_GPUS = 2
GPU_MEM = 80
GPU_CONFIG = modal.gpu.A100(count=N_GPUS, memory=GPU_MEM) # you can also change this to use A10Gs or T4s
GPU_MEM = 40
GPU_CONFIG = modal.gpu.A100(count=N_GPUS, memory=GPU_MEM) # you can also change this to use A10Gs or T4s
```

**Logging with Weights and Biases**
Expand Down Expand Up @@ -161,16 +161,15 @@ The script reads two local files containing the config information and the datas

When you make local changes to either your config or data, they will be used for your next training run.

The default configuration fine-tunes CodeLlama Instruct 7B to understand Modal documentation for five epochs as a proof of concept. It uses DeepSpeed ZeRO-3 to shard the model state across 2 A100s. To achieve better results, you would need to use more data and train for more epochs.

**Inference**

To try a model from a completed run, you can select a folder via `modal volume ls examples-runs-vol`, and then specify the training folder for inference:

```bash
modal run -q src.inference::inference_main --run-folder /runs/axo-2023-11-24-17-26-66e8
modal run -q src.inference::inference_main --run-folder=...
```

The training script writes the most recent run name to a local file, `.last_run_name`, for easy access.

## Using the GUI

Expand Down
31 changes: 31 additions & 0 deletions ci/check_loss.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from io import StringIO
import re
import sys

import pandas as pd

from modal import Volume


if __name__ == "__main__":

with open(".last_run_name", "r") as f:
run_name = f.read().strip()

vol = Volume.lookup("example-runs-vol")
contents = b""
for chunk in vol.read_file(f"{run_name}/lora-out/README.md"):
contents += chunk

m = re.search(r"### Training results\n\n(.+?)#", contents.decode(), flags=re.DOTALL)
if m is None:
sys.exit("Could not parse training results from model card")
else:
results_text = m.group(1).strip().replace(" ", "")

results = pd.read_table(StringIO(results_text), sep="|")
train_loss = float(results["TrainingLoss"].iloc[-1])
val_loss = float(results["ValidationLoss"].iloc[-1])

print(f"Loss: {train_loss:.2f} (training), {val_loss:.2f} (validation)")
sys.exit(val_loss > 0.25) # Arbitrary threshold
15 changes: 7 additions & 8 deletions ci/prep_for_ci.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,19 @@
@click.option("--config")
@click.option("--data")
def main(config: str, data: str):
"""Set the config for lighter-weight training and truncate the dataset."""
"""Set the config to train for only one epoch and truncate the dataset."""
train_set_size = 1000
val_set_size = 64
with open(config) as f:
cfg = yaml.safe_load(f.read())
cfg["sequence_len"] = 1024
cfg["val_set_size"] = 100
cfg["eval_batch_size"] = 2
cfg["micro_batch_size"] = 2
cfg["num_epochs"] = 2
cfg.pop("eval_steps", None)
cfg["val_set_size"] = val_set_size
cfg["num_epochs"] = 1
cfg.pop("eval_steps", None) # Evaluate once at the end of the epoch
with open(config, "w") as f:
yaml.dump(cfg, f)

with open(data) as f:
data_truncated = f.readlines()[:1000]
data_truncated = f.readlines()[: train_set_size + val_set_size]
with open(data, "w") as f:
f.writelines(data_truncated)

Expand Down
20 changes: 10 additions & 10 deletions config/codellama.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ strict: false

datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: my_data.jsonl
- path: data.jsonl
ds_type: json
type:
# JSONL file contains question, context, answer fields per line.
Expand All @@ -28,9 +28,9 @@ val_set_size: 0.05
output_dir: ./lora-out

sequence_len: 4096
sample_packing: true
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: true
pad_to_sequence_len: false

adapter: lora
lora_model_dir:
Expand All @@ -46,15 +46,15 @@ wandb_watch:
wandb_run_id:

gradient_accumulation_steps: 1
micro_batch_size: 16
num_epochs: 5
optimizer: adamw_bnb_8bit
micro_batch_size: 32
num_epochs: 4
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0002
learning_rate: 0.0001

train_on_inputs: false
group_by_length: false
bf16: true
bf16: auto
fp16: false
tf32: false

Expand All @@ -70,8 +70,8 @@ flash_attention: true
warmup_steps: 10
eval_steps: 0.05
save_steps:
debug: True
deepspeed: /root/axolotl/deepspeed/zero3.json
debug:
deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
weight_decay: 0.0
fsdp:
fsdp_config:
Expand Down
21 changes: 10 additions & 11 deletions config/llama-2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: true
load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: my_data.jsonl
- path: data.jsonl
ds_type: json
type:
# JSONL file contains question, context, answer fields per line.
Expand All @@ -28,14 +28,14 @@ val_set_size: 0.05
output_dir: ./lora-out

sequence_len: 4096
sample_packing: true
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: true
pad_to_sequence_len: false

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
Expand All @@ -44,14 +44,13 @@ wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
gradient_accumulation_steps: 1
micro_batch_size: 32
num_epochs: 4
optimizer: adamw_bnb_8bit
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0002
learning_rate: 0.0001

train_on_inputs: false
group_by_length: false
Expand Down
24 changes: 12 additions & 12 deletions config/mistral.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ strict: false

datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: my_data.jsonl
- path: data.jsonl
ds_type: json
type:
# JSONL file contains question, context, answer fields per line.
Expand All @@ -24,13 +24,13 @@ datasets:
{instruction} [/INST]
dataset_prepared_path:
val_set_size: 32
val_set_size: 0.05
output_dir: ./lora-out

sequence_len: 2048
sample_packing: true
sequence_len: 4096
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: true
pad_to_sequence_len: false

adapter: lora
lora_model_dir:
Expand All @@ -46,13 +46,13 @@ wandb_watch:
wandb_run_id:

gradient_accumulation_steps: 1
micro_batch_size: 16
num_epochs: 1
optimizer: adamw_bnb_8bit
micro_batch_size: 32
num_epochs: 4
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0002
learning_rate: 0.0001

bf16: true
bf16: auto
fp16: false
tf32: false
train_on_inputs: false
Expand All @@ -64,12 +64,12 @@ resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false
flash_attention: true

warmup_steps: 10
save_steps:
debug:
deepspeed: /root/axolotl/deepspeed/zero3.json
deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
weight_decay: 0.0
fsdp:
fsdp_config:
Expand Down
Loading

0 comments on commit 62cfb65

Please sign in to comment.