Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Help] 用vllm 起INT4量化版本的模型报错 类型不匹配 self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] #680

Open
1 task done
yjjiang11 opened this issue Jun 5, 2024 · 0 comments

Comments

@yjjiang11
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096]
which results failling to setup vllm server

Expected Behavior

chatglm2-6b-int4 can be deployed with vllm

Steps To Reproduce

none

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

s

@yjjiang11 yjjiang11 changed the title [BUG/Help]self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] [BUG/Help] 用vllm 起INT4量化版本的模型报错 类型不匹配 self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant