We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] which results failling to setup vllm server
chatglm2-6b-int4 can be deployed with vllm
none
- OS: - Python: - Transformers: - PyTorch: - CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
s
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Is there an existing issue for this?
Current Behavior
self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096]
which results failling to setup vllm server
Expected Behavior
chatglm2-6b-int4 can be deployed with vllm
Steps To Reproduce
none
Environment
Anything else?
s
The text was updated successfully, but these errors were encountered: