-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding LLaVA-Phi-Mini-3 #12
Conversation
It works, but the prompt template apparently is not correct. Maybe wait for an official update from HF for apply_chat_template? air@MacBook-Air-van-Air mlx-vlm % python3.10 -m mlx_vlm.generate --model xtuner/llava-phi-3-mini-hf --max-tokens 100 --temp 0.0 model-00002-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.31G/3.31G [11:01<00:00, 5.01MB/s] model-00001-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.99G/4.99G [13:49<00:00, 6.01MB/s] Fetching 11 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [13:49<00:00, 75.41s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.99G/4.99G [13:49<00:00, 10.5MB/s] ========== Image: http://images.cocodataset.org/val2017/000000039769.jpg Prompt: <s><|user|> <image> What are these?<|end|> <|assistant|> These are two cats sleeping on a pink couch.<|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|> ========== Prompt: 0.969 tokens-per-sec Generation: 1.355 tokens-per-sec
Awesome, great work! |
What do you mean by the chat template is wrong? |
Thank you! |
Hmm, the model is working fine. The problem exists because If you patch the tokenizer it will work :) tokenizer.eos_token = "<|end|>" |
Tried two options: Option 1 Option 2
in llavaPhi.Model from_pretrained at the end before return model. Unfortunately both did not work. |
Tokenizer is in the processor. Llava type models usually have it setup this way Check: https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/utils.py#L714-L719 |
Got it working with
However, this also overrides the llava-1.5 model right? So I tried to add an extra condition, but I cannot call model_type == 'llava':
If you know how to call the right component where model config main architecture == 'llama', could you implement this? Then we can merge. |
Great Job! Let's keep everything as it was. I would prefer if we patch the tokenizer directly and save it once we convert the model in the MLX hub :) That way we don't have to add that extra logic here. |
Btw, I just added tests recently #13. If you could add the test cases once you finish it would be great. |
Please don't forget to rename the folder to |
Closes #11 |
Hey @s-smits, Thank you very much for your contributions! During my tests I found that this model uses the existing llava implementation. I will close this PR. You can use the already pre-quantized model in the hub:
Just install the latest version: |
You're welcome. Due to other projects I couldn't finish it in time, thank you for converting it. |
The model almost works, the chat template however is wrong. Maybe wait for an official update from HF for apply_chat_template?