RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model #2944

edesalve · 2025-01-23T09:34:37Z

System Info

Hi all,

I encountered an issue when trying to run the Qwen/Qwen2-VL-72B-Instruct-AWQ model using the latest text-generation-inference Docker container (same issue with 3.0.1). The error message is as follows:

RuntimeError: Cannot load `awq` weight, make sure the model is already quantized.

Here is the command I used to start the container:

docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ

I noticed a related issue (#2036), which seems to describe the same problem and it is marked as closed (#2233). However, it appears that the problem persists.

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

docker pull ghcr.io/huggingface/text-generation-inference:latest
docker run -d --runtime nvidia --gpus '"device=2"' --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest --model-id Qwen/Qwen2-VL-72B-Instruct-AWQ

Expected behavior

The container should successfully start, and the model should load without errors.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model #2944

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model #2944

edesalve commented Jan 23, 2025

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model #2944

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model #2944

Comments

edesalve commented Jan 23, 2025

System Info

Information

Tasks

Reproduction

Expected behavior