Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantized model using AWQ and lora weights #2703

Open
shuyuan-wang opened this issue Jan 17, 2025 · 2 comments
Open

quantized model using AWQ and lora weights #2703

shuyuan-wang opened this issue Jan 17, 2025 · 2 comments
Assignees
Labels
Investigating Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers

Comments

@shuyuan-wang
Copy link

Hello:
Does TensorRT-LLM supports a model quantized with AWQ and the lora weights trained on the quantized weights?

@nv-guomingz nv-guomingz added the Low Precision Issue about lower bit quantization, including int8, int4, fp8 label Jan 20, 2025
@github-actions github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Jan 20, 2025
@Tracin
Copy link
Collaborator

Tracin commented Jan 21, 2025

I think we only support full precision Lora model for now.

@lodm94
Copy link

lodm94 commented Jan 22, 2025

AutoAWQ checkpoint can be converted in TRT-LLM with LoRA support allowing inference with adapters or foundation model through lora uids. Check example for llama.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigating Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants