Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update axolotl image and other dependencies #28

Merged
merged 28 commits into from
Feb 9, 2024
Merged

Conversation

mwaskom
Copy link
Collaborator

@mwaskom mwaskom commented Feb 6, 2024

The primary change here is to update the version of the axolotl container to correspond to the v0.4.0 release. There are also some changes directly downstream of that:

  • We no longer install an older checkout of transformers
  • Mistral no longer hangs on evaluation with flash_attention enabled
  • We've updated the deepspeed config paths

Additionally, I've made some updates to the configs that aren't strictly related to the axolotl version, but arose from the testing that I was doing:

  • I've disabled sample_packing which seems to be on net harmful for the medium-sized finetuning dataset we use in our demonstration.
  • (Mostly as a result of the above) I downgraded the base GPU request to use 2 40-GB A100s, which are easier to get
  • I aligned the configs between the three models (mainly this means removing quantization from Llama-2). I suspect that it's confusing to use different configs for different base models; new users could interpret that as "you train mistral at half native precision but have to use quantization for llama", or something similar.

Finally, I updated some of the CI that I added in a previous PR:

  • I removed some of the configuration changes that made the CI training "lighter weight", now all I change is running on a truncated dataset for a single epoch, with just one evaluation at the end of the epoch
  • I added on assertion on the validation loss. This involves some pretty hacky stuff as I don't see any obvious way to get structured results from the axolotl outputs (without going through mlflow or wandb, which maybe would have been better)

Despite being fairly lightweight and taking just a couple of minutes, the models that train in CI seem pretty good! (evaluation loss of ≈0.06 for Mistral).

modal-pr-review-automation[bot]

This comment was marked as off-topic.

@mwaskom mwaskom requested a review from gongy February 7, 2024 21:40
@mwaskom mwaskom merged commit 62cfb65 into main Feb 9, 2024
3 checks passed
@mwaskom mwaskom deleted the michael/update-versions branch February 9, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant