You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Node discovery is currently implemented by either having nodes coordinate via a distributed filesystem or otherwise waiting until the jobs are running and using the K8s API to check which node is running a particular head node script.
A supported way of launching a job set and labelling worker and head nodes so these can be easily discovered from the jobs themselves would be very useful for supporting distributed ML on armada.
Node discovery is currently implemented by either having nodes coordinate via a distributed filesystem or otherwise waiting until the jobs are running and using the K8s API to check which node is running a particular head node script.
A supported way of launching a job set and labelling worker and head nodes so these can be easily discovered from the jobs themselves would be very useful for supporting distributed ML on armada.
┆Issue is synchronized with this Jira Task by Unito
The text was updated successfully, but these errors were encountered: