Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facing this error while trying to use unsloth's 4bit llama 3.2 vision 11B model for OCR task #1581

Open
Pavankurapati03 opened this issue Jan 27, 2025 · 1 comment
Labels
currently fixing Am fixing now!

Comments

@Pavankurapati03
Copy link

Pavankurapati03 commented Jan 27, 2025

Load model directly

from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit")
model = AutoModelForImageTextToText.from_pretrained("unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit")

from PIL import Image

Open the image

image = Image.open("/teamspace/studios/this_studio/Screenshot 2024-12-22 132529.png") # Replace with your image file path

Ensure both image and text are passed correctly

inputs = processor(images=image, text="Extract the text from this image.", return_tensors="pt")

Generate predictions

outputs = model.generate(**inputs)

Decode the model's output

extracted_text = processor.decode(outputs[0], skip_special_tokens=True)
print("Extracted Text:", extracted_text)


After running this code iam getting this error:


ValueError Traceback (most recent call last)
Cell In[4], line 7
4 image = Image.open("/teamspace/studios/this_studio/Screenshot 2024-12-22 132529.png") # Replace with your image file path
6 # Ensure both image and text are passed correctly
----> 7 inputs = processor(images=image, text="Extract the text from this image.", return_tensors="pt")
9 # Generate predictions
10 outputs = model.generate(**inputs)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/mllama/processing_mllama.py:309, in MllamaProcessor.call(self, images, text, audio, videos, **kwargs)
307 raise ValueError("No image were provided, but there are image tokens in the prompt")
308 else:
--> 309 raise ValueError(
310 f"The number of image token ({sum(n_images_in_text)}) should be the same as in the number of provided images ({sum(n_images_in_images)})"
311 )
313 if images is not None:
314 image_features = self.image_processor(images, **images_kwargs)

ValueError: The number of image token (0) should be the same as in the number of provided images (1)

@danielhanchen danielhanchen added the currently fixing Am fixing now! label Jan 28, 2025
@danielhanchen
Copy link
Contributor

Hmm interesting bug - I will investigate! Sorry on the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
currently fixing Am fixing now!
Projects
None yet
Development

No branches or pull requests

2 participants