You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
KubeRay Component
ray-operator
What happened + What you expected to happen
Currently, if the submitter pod cannot obtain the log information on the Ray Cluster through the GET API, it will fail directly. This may not meet the user's expectations. It should support continuous retries instead of direct failure.
At the same time, I am also curious why it is still necessary to use domain name splicing instead of directly accessing through IP addresses inside kuberay. In this way, the network should be more stable (one less step of DNS)
Reproduction script
Submit a RayJob, and then the GET LOG API fails due to network reasons
Anything else
No response
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
Currently, if the submitter pod cannot obtain the log information on the Ray Cluster through the GET API, it will fail directly. This may not meet the user's expectations. It should support continuous retries instead of direct failure.
At the same time, I am also curious why it is still necessary to use domain name splicing instead of directly accessing through IP addresses inside kuberay. In this way, the network should be more stable (one less step of DNS)
Reproduction script
Submit a RayJob, and then the GET LOG API fails due to network reasons
Anything else
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: