Quantize a single tensor obtained from a float32 model #1364
Unanswered
Boltzmachine
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a model consisting of two parts: the first is an encoder of float32, and the second is a quantized LLM.
If I load the LLM in bfloat16, I can do encoder(x).bfloat() and feed it into the LLM. But for LLM in 8bit, I cannot find a corresponding way to convert the output of the encoder.
Beta Was this translation helpful? Give feedback.
All reactions