This project contains examples of using multi-client DTensor on GCP with a cluster of GPUs or TPUs.
- gcloud environment on the local console:
gcloud auth login ...
gcloud config set project ...
- A GCS bucket that the GCE service account can write into. The bucket is used to demo checkpointing. Set the prefix paths name with
export GCS_BUCKET=<bucket_name>
or edit bootstrap.sh.
Since this requires tf-nightly, periodically update requirements.txt in each directory to known 'good' versions for these examples.