Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add status messaging and endpoint to allow for client-side users to reason about model initialization and life cycle. #17

Open
tstescoTT opened this issue Sep 26, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@tstescoTT
Copy link
Contributor

Logs during setup, especially tt_metal_cache generation on 1st run, for example:

inference_api_server - ERROR - Model backend is hanging. time_since_response:=844.313969373703, time_since_status_msg:=844.30121231079

Are confusing because the model is timing out from the inference server perspective, but is still initializing so this is expected behavior.

Correct way to deal with this is to use the status queue and push messages for model init process to the inference server, the server can in turn expose a status endpoint for downstream consumers to understand when the model is ready.

@tstescoTT tstescoTT added the enhancement New feature or request label Sep 26, 2024
@tstescoTT tstescoTT self-assigned this Sep 26, 2024
@tstescoTT
Copy link
Contributor Author

Should explore usage in vLLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant