-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update maxtext checkpoitin test to add async #571
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Have you tested it on airflow yet? sometimes the cmds will parse differently. Would be nice to have one run in the environment.
By this do you mean testing in my own dev airflow environment? |
Yes. That's right. |
@RissyRan I'm running into many issues trying to get my XLML dev environment stable. But, I can see that the the currently running tests are running with async checkpointing successfully: https://efdb2a1d6c2c435b8b7de77690c286a9-dot-us-central1.composer.googleusercontent.com/dags/maxtext_checkpointing/grid?task_id=maxtext-checkpointing-stable-v4-8.run_model.wait_for_workload_completion&dag_run_id=scheduled__2025-01-25T10%3A00%3A00%2B00%3A00&tab=logs From the dev tests I did get running before my environment started crashing, I did see that the arg passing in the xlml test I updated had some issues, so I've instead explicitly passed args for everything. https://3e3ff8d06f3a490aa2f9778c110f8adf-dot-us-central1.composer.googleusercontent.com/dags/maxtext_checkpointing/grid?execution_date=2025-01-26+17%3A55%3A17.636853%2B00%3A00&task_id=maxtext-checkpointing-nightly-sync-2xv4-16&tab=details&dag_run_id=manual__2025-01-26T17%3A55%3A17.636853%2B00%3A00 shows that all the test cases I expect are being created (for both sync and async). And the arg parsing is less unusual now :D I'm thinking it might be easier to merge this and monitor the next run on XLML, rather than spend more time trying to fix my dev environment. If there are issues with the test, I can update. |
Thank you for the test! |
Description
Updated maxtext_checkpointing test to test both async and sync
Tests
Minor tweaks to calling maxtext e2e test. Ran the test locally with same config
Checklist
Before submitting this PR, please make sure (put X in square brackets):