-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make training on the MPS device work #131
Conversation
@epwalsh , some thoughts? This now has a However, another view could be that "your local machine" is a "cluster" like any other. Then we don't need another command. But it's a little weird that the special automatic munging of configs we do only happens when your cluster is set to "local". Also hard to write the code for that. |
I think having the separate command is fine |
@@ -76,48 +73,6 @@ def build_trainer_config(common: CommonComponents) -> TrainerConfig: | |||
cancel_check_interval=10, | |||
), | |||
) | |||
.with_callback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That callback doesn't initialize properly, because some of those datasets don't exist, and none of the other sizes have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, other than the changes to the 1B config
This PR has been released in v1.8.0. |
No description provided.