Facing this error while trying to use unsloth's 4bit llama 3.2 vision 11B model for OCR task #1581

Pavankurapati03 · 2025-01-27T05:31:39Z

Load model directly

from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit")
model = AutoModelForImageTextToText.from_pretrained("unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit")

from PIL import Image

Open the image

image = Image.open("/teamspace/studios/this_studio/Screenshot 2024-12-22 132529.png") # Replace with your image file path

Ensure both image and text are passed correctly

inputs = processor(images=image, text="Extract the text from this image.", return_tensors="pt")

Generate predictions

outputs = model.generate(**inputs)

Decode the model's output

extracted_text = processor.decode(outputs[0], skip_special_tokens=True)
print("Extracted Text:", extracted_text)

After running this code iam getting this error:

ValueError Traceback (most recent call last)
Cell In[4], line 7
4 image = Image.open("/teamspace/studios/this_studio/Screenshot 2024-12-22 132529.png") # Replace with your image file path
6 # Ensure both image and text are passed correctly
----> 7 inputs = processor(images=image, text="Extract the text from this image.", return_tensors="pt")
9 # Generate predictions
10 outputs = model.generate(**inputs)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/mllama/processing_mllama.py:309, in MllamaProcessor.call(self, images, text, audio, videos, **kwargs)
307 raise ValueError("No image were provided, but there are image tokens in the prompt")
308 else:
--> 309 raise ValueError(
310 f"The number of image token ({sum(n_images_in_text)}) should be the same as in the number of provided images ({sum(n_images_in_images)})"
311 )
313 if images is not None:
314 image_features = self.image_processor(images, **images_kwargs)

ValueError: The number of image token (0) should be the same as in the number of provided images (1)

danielhanchen · 2025-01-28T11:04:28Z

Hmm interesting bug - I will investigate! Sorry on the issue!

danielhanchen added the currently fixing Am fixing now! label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facing this error while trying to use unsloth's 4bit llama 3.2 vision 11B model for OCR task #1581

Facing this error while trying to use unsloth's 4bit llama 3.2 vision 11B model for OCR task #1581

Pavankurapati03 commented Jan 27, 2025 •

edited

Loading

danielhanchen commented Jan 28, 2025

Facing this error while trying to use unsloth's 4bit llama 3.2 vision 11B model for OCR task #1581

Facing this error while trying to use unsloth's 4bit llama 3.2 vision 11B model for OCR task #1581

Comments

Pavankurapati03 commented Jan 27, 2025 • edited Loading

Load model directly

Open the image

Ensure both image and text are passed correctly

Generate predictions

Decode the model's output

danielhanchen commented Jan 28, 2025

Pavankurapati03 commented Jan 27, 2025 •

edited

Loading