You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Logs during setup, especially tt_metal_cache generation on 1st run, for example:
inference_api_server - ERROR - Model backend is hanging. time_since_response:=844.313969373703, time_since_status_msg:=844.30121231079
Are confusing because the model is timing out from the inference server perspective, but is still initializing so this is expected behavior.
Correct way to deal with this is to use the status queue and push messages for model init process to the inference server, the server can in turn expose a status endpoint for downstream consumers to understand when the model is ready.
The text was updated successfully, but these errors were encountered:
Logs during setup, especially tt_metal_cache generation on 1st run, for example:
Are confusing because the model is timing out from the inference server perspective, but is still initializing so this is expected behavior.
Correct way to deal with this is to use the status queue and push messages for model init process to the inference server, the server can in turn expose a status endpoint for downstream consumers to understand when the model is ready.
The text was updated successfully, but these errors were encountered: