v1.5.0
What's new
Added π
- Added Google Cloud support for
list_directory()
andclear_directory()
. - Added
CometCallback
for logging training runs to Comet.ml. - Added
DataMixBase
class, to allow extending to new data mix groups. - Added support for MoE-based models.
- Added method
DataLoaderBase.get_mock_batch()
. - Trainer now starts with a dry-run of a fake batch created by
DataLoaderBase.get_mock_batch()
. - Added
Callback.pre_backward()
,.pre_eval_batch()
, and.post_eval_batch()
methods. - Added
Trainer.model_forward()
,.get_losses()
, and.eval_batch()
methods. - Added a new
TransformerActivationCheckpointingMode
, "selected_ops" (requires torch 2.5 or newer).
Changed β οΈ
BeakerLaunchConfig.setup_steps
should now include steps to clone your repo (which it will by default). This change allows support for private repos.
Fixed β
prepare_cli_environment()
now callsadd_cached_path_clients()
.- Removed an unnecessary host-device sync.
Commits
984eb26 Update README.md
0f0d282 Update README.md
310866e Add FP8 numbers for the 13B
425f7db Add "selected_ops" transformer AC mode (#71)
d90292e Move transformer config components to its own submodule
4d3b231 Add support for MoE models with megablocks (#60)
6e32043 Add Google Cloud support for more io
functions (#69)
5af60ba Avoid an unnecessary host-device sync when created initial loss tensors (#68)
ad4c8bb Switch to comet callback in official train scripts
d90f5da Add comet API key to launch config
0c75ef6 Do a dry-run batch before starting training (#67)
71bc5c8 Add save_state_dict
function
9c25aed Update the Comet.ml callback (#66)
6e4ee4e Add BaseDataMix class (#65)
54a74c3 Add a Comet.ml trainer callback (#64)
9ba0e63 Update base image with newer torch and flash-attn versions (#63)
97172fc avoid omegaconf interpolation in setup steps
48892ee include clone commands in setup steps (#62)