Add status messaging and endpoint to allow for client-side users to reason about model initialization and life cycle. #17

tstescoTT · 2024-09-26T18:06:59Z

Logs during setup, especially tt_metal_cache generation on 1st run, for example:

inference_api_server - ERROR - Model backend is hanging. time_since_response:=844.313969373703, time_since_status_msg:=844.30121231079

Are confusing because the model is timing out from the inference server perspective, but is still initializing so this is expected behavior.

Correct way to deal with this is to use the status queue and push messages for model init process to the inference server, the server can in turn expose a status endpoint for downstream consumers to understand when the model is ready.

The text was updated successfully, but these errors were encountered:

tstescoTT · 2024-11-15T14:47:05Z

Should explore usage in vLLM.

tstescoTT added the enhancement New feature or request label Sep 26, 2024

tstescoTT self-assigned this Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add status messaging and endpoint to allow for client-side users to reason about model initialization and life cycle. #17

Add status messaging and endpoint to allow for client-side users to reason about model initialization and life cycle. #17

tstescoTT commented Sep 26, 2024

tstescoTT commented Nov 15, 2024

Add status messaging and endpoint to allow for client-side users to reason about model initialization and life cycle. #17

Add status messaging and endpoint to allow for client-side users to reason about model initialization and life cycle. #17

Comments

tstescoTT commented Sep 26, 2024

tstescoTT commented Nov 15, 2024