Update axolotl image and other dependencies #28

mwaskom · 2024-02-06T19:21:58Z

The primary change here is to update the version of the axolotl container to correspond to the v0.4.0 release. There are also some changes directly downstream of that:

We no longer install an older checkout of transformers
Mistral no longer hangs on evaluation with flash_attention enabled
We've updated the deepspeed config paths

Additionally, I've made some updates to the configs that aren't strictly related to the axolotl version, but arose from the testing that I was doing:

I've disabled sample_packing which seems to be on net harmful for the medium-sized finetuning dataset we use in our demonstration.
(Mostly as a result of the above) I downgraded the base GPU request to use 2 40-GB A100s, which are easier to get
I aligned the configs between the three models (mainly this means removing quantization from Llama-2). I suspect that it's confusing to use different configs for different base models; new users could interpret that as "you train mistral at half native precision but have to use quantization for llama", or something similar.

Finally, I updated some of the CI that I added in a previous PR:

I removed some of the configuration changes that made the CI training "lighter weight", now all I change is running on a truncated dataset for a single epoch, with just one evaluation at the end of the epoch
I added on assertion on the validation loss. This involves some pretty hacky stuff as I don't see any obvious way to get structured results from the axolotl outputs (without going through mlflow or wandb, which maybe would have been better)

Despite being fairly lightweight and taking just a couple of minutes, the models that train in CI seem pretty good! (evaluation loss of ≈0.06 for Mistral).

Remove environment key from CI yaml

7feba35

This comment was marked as off-topic.

Sign in to view

mwaskom added 15 commits February 6, 2024 14:29

Update base image spec to axolotl 0.4.0

247c994

Update deepspeed config location

d1b236c

Remove redundant configuration flags from merge cmdline

1abc4f8

Disable debug mode in codellama config

d09043a

Try re-enabling mistral flash attention

f6f117b

Revert some of the CI training overrides

9a5a848

Don't truncate data

6bc8e7e

Try a config without sample packing

efd0984

Don't pad to sequence length

0ae5b47

Reinstate CI data truncation

1bb3630

Set base GPU config to use A100-40GB

79b123c

Remove sample packing and standardize batch / LR params for all models

0b8735b

Standardize sequence_len for mistral

93ee98a

Use consistent fractional val_set_size

b08062e

Disable quantization in llama config

89790fc

mwaskom force-pushed the michael/update-versions branch from 3aa70b0 to 89790fc Compare February 6, 2024 23:44

mwaskom added 5 commits February 6, 2024 18:45

Fix CI val_set_size

e855a8b

Try simple torch optimizer

791a5af

Try reverting deepspeed workaround

0d34607

Fix type annotation

d57e2b6

Add a step to assert that the evaluation loss is reasonable

e2a56be

mwaskom force-pushed the michael/update-versions branch from af1b63a to e2a56be Compare February 7, 2024 18:10

mwaskom added 2 commits February 7, 2024 13:18

Fix run name

c6ba1ab

Improve results table extraction

5bfe720

mwaskom force-pushed the michael/update-versions branch from 0cd1e44 to 5bfe720 Compare February 7, 2024 18:49

Fix direction of loss assertion

9db01a4

mwaskom force-pushed the michael/update-versions branch from 59bc7fd to 9db01a4 Compare February 7, 2024 18:57

Don't call the remote data my_data

e963fed

mwaskom added 3 commits February 7, 2024 16:31

Remove huggingface secret (it's not needed for thse models)

50912c4

Bump huggingface util pins

524a16f

Update README

b289ced

mwaskom requested a review from gongy February 7, 2024 21:40

mwaskom merged commit 62cfb65 into main Feb 9, 2024
3 checks passed

mwaskom deleted the michael/update-versions branch February 9, 2024 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update axolotl image and other dependencies #28

Update axolotl image and other dependencies #28

mwaskom commented Feb 6, 2024 •

edited

Loading

This comment was marked as off-topic.

Update axolotl image and other dependencies #28

Update axolotl image and other dependencies #28

Conversation

mwaskom commented Feb 6, 2024 • edited Loading

This comment was marked as off-topic.

mwaskom commented Feb 6, 2024 •

edited

Loading