You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first time it says that the model is loading. When you do the refresh it turned out the model was now loaded, so the inference was fast this time. Moving to community repo
Multiple issues things at play that are currently known about:
Model loading is not really using correct information. api-inference doesn't know how to "guess" the model size properly, so the loading bar is not accurate. It's never acurate, but the simple rule of thumb would still mean the loading bar would be bigger and more representative.
First loads are always much longer due to downloading the weights
Sometimes, depending on cluster conditions creating the docker is slower than usual (depends how many GPUs are used, how many nodes are available etc.. creating a new node on demand is much slower than just launching the pod)
Inference still takes 5-6s which feels very "slow" to us humans. Using xformers and fast attention should help a bit (expected to go down to 3s).
Here I'm thinking 1/ and 4/ are the most effective things we can do something about.
We're also working on adding tracing to the cluster so we have a better picture of 2 and 3.
Describe the bug
it gets stuck at model loading
Reproduction
go to https://huggingface.co/nitrosocke/classic-anim-diffusion and prompt for the first time
https://www.loom.com/share/10fdb5920e0248cc8162e145f8957d77
Logs
No response
System info
The text was updated successfully, but these errors were encountered: