Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model #2944

Open
2 of 4 tasks
edesalve opened this issue Jan 23, 2025 · 0 comments
Open
2 of 4 tasks

Comments

@edesalve
Copy link

System Info

Hi all,

I encountered an issue when trying to run the Qwen/Qwen2-VL-72B-Instruct-AWQ model using the latest text-generation-inference Docker container (same issue with 3.0.1). The error message is as follows:

RuntimeError: Cannot load `awq` weight, make sure the model is already quantized.

Here is the command I used to start the container:

docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ

I noticed a related issue (#2036), which seems to describe the same problem and it is marked as closed (#2233). However, it appears that the problem persists.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

docker pull ghcr.io/huggingface/text-generation-inference:latest
docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ

Expected behavior

The container should successfully start, and the model should load without errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant