Quantize a single tensor obtained from a float32 model #1364

Boltzmachine · 2024-09-18T22:03:47Z

Boltzmachine
Sep 18, 2024

I have a model consisting of two parts: the first is an encoder of float32, and the second is a quantized LLM.
If I load the LLM in bfloat16, I can do encoder(x).bfloat() and feed it into the LLM. But for LLM in 8bit, I cannot find a corresponding way to convert the output of the encoder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantize a single tensor obtained from a float32 model #1364

{{title}}

Replies: 0 comments

Select a reply

Quantize a single tensor obtained from a float32 model #1364

Boltzmachine Sep 18, 2024

Replies: 0 comments

Boltzmachine
Sep 18, 2024