load_in_4bit does not work #1263
Unanswered
PatchouliPatch
asked this question in
CATCH-ALL: alpha testing the `multi-backend-refactor`
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
the following is a modified version of the code from the Phi3 huggingface page.
For some reason, load_in_4bit does not work but load_in_8bit does.
the inference does not seem to finish.
at full precision the output is:
and at 8-bit quant:
im using ROCm 6.0.3 and the stable branch of torch after finding out that the latest nightly builds of 6.1.x are having problems with quantization
any possible debugging steps?
Beta Was this translation helpful? Give feedback.
All reactions