-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JAX Hello World Multi-Node GKE H100 with GPUDirectTCPx tutorial #1236 #1237
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a few minor comments. Feel free to ping me when the PR is stable and ready for review, @parambole !
@@ -0,0 +1,53 @@ | |||
# JAX Mult-Node 'Hello World' on GKE + H100-80GB with GPUDirectTCPx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically we'll have instructions be the Google Cloud docs tutorial, and point to that from here. This reduces duplication and makes it so we only have to modify the instructions in one source of truth (the tutorial)
@@ -0,0 +1,14 @@ | |||
FROM python:3.10-slim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make a workflow that does at minimum docker build
as a dry-run that it does build? You can find some examples in the .github/workflows
directory
@parambole hi! Friendly ping for these samples |
Description
Adds JAX Hello World Multi-Node GKE H100 with GPUDirectTCPx tutorial
Tasks