Adding LLaVA-Phi-Mini-3 #12

s-smits · 2024-04-25T22:27:47Z

The model almost works, the chat template however is wrong. Maybe wait for an official update from HF for apply_chat_template?

It works, but the prompt template apparently is not correct. Maybe wait for an official update from HF for apply_chat_template? air@MacBook-Air-van-Air mlx-vlm % python3.10 -m mlx_vlm.generate --model xtuner/llava-phi-3-mini-hf --max-tokens 100 --temp 0.0 model-00002-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.31G/3.31G [11:01<00:00, 5.01MB/s] model-00001-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.99G/4.99G [13:49<00:00, 6.01MB/s] Fetching 11 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [13:49<00:00, 75.41s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.99G/4.99G [13:49<00:00, 10.5MB/s] ========== Image: http://images.cocodataset.org/val2017/000000039769.jpg Prompt: <s><|user|> <image> What are these?<|end|> <|assistant|> These are two cats sleeping on a pink couch.<|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|> ========== Prompt: 0.969 tokens-per-sec Generation: 1.355 tokens-per-sec

Blaizzy · 2024-04-25T22:51:45Z

Awesome, great work!

Blaizzy · 2024-04-25T23:09:16Z

The model almost works, the chat template however is wrong. Maybe wait for an official update from HF for apply_chat_template?

What do you mean by the chat template is wrong?
Could you elaborate?

s-smits · 2024-04-25T23:14:42Z

Thank you!
The output is the following:
These are two cats sleeping on a pink couch.<|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|><|end|>
From my experience this normally is a chat template problem, where the model does not know when to stop. What do you think?

Blaizzy · 2024-04-25T23:19:12Z

Thank you! The output is the following: These are two cats sleeping on a pink couch.<|end|><|end|><|end|><|end|><|end|><|end|><|end|>... From my experience this normally is a chat template problem, where the model does not know when to stop. What do you think?

Hmm, the model is working fine. The problem exists because eos_token is different from the one used in the chat_template.

If you patch the tokenizer it will work :)

tokenizer.eos_token = "<|end|>"

s-smits · 2024-04-25T23:49:20Z

Tried two options:

Option 1
Add
self.tokenizer.eos_token = "<|end|>"
to language.LanguageModel init

Option 2

    model.language_model.tokenizer.eos_token = "<|end|>"
    model.language_model.tokenizer.pad_token = "<|end|>"

in llavaPhi.Model from_pretrained at the end before return model.

Unfortunately both did not work.

Blaizzy · 2024-04-26T00:38:45Z

Tokenizer is in the processor.

Llava type models usually have it setup this way processor.tokenizer while others, such as nanollava have it as the processor itself.

Check:

https://github.com/Blaizzy/mlx-vlm/blob/main/mlx_vlm/utils.py#L714-L719

s-smits · 2024-04-26T12:58:44Z

Got it working with
utils.py 714:723

    if image_processor is not None:
        prompt_tokens = mx.array(processor.encode(prompt))
        tokenizer = processor
    else:
        prompt_tokens = mx.array(processor.tokenizer.encode(prompt))
        tokenizer = processor.tokenizer
    
    if model.config.text_config.model_type == 'llama':
        processor.tokenizer.eos_token = "<|end|>"
        processor.tokenizer.pad_token = "<|end|>"

However, this also overrides the llava-1.5 model right? So I tried to add an extra condition, but I cannot call model_type == 'llava':

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/[email protected]/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/air/Repositories/mlx-vlm/mlx_vlm/generate.py", line 95, in <module>
    main()
  File "/Users/air/Repositories/mlx-vlm/mlx_vlm/generate.py", line 82, in main
    generate(
  File "/Users/air/Repositories/mlx-vlm/mlx_vlm/utils.py", line 724, in generate
    print('model type',model.config.model_type)
AttributeError: 'ModelConfig' object has no attribute 'model_type'

{
--
  | "architectures": [
  | "LlavaForConditionalGeneration"
  | ],
  | "ignore_index": -100,
  | "image_token_index": 32038,
  | "model_type": "llava",
  | "pad_token_id": 32039,
  | "projector_hidden_act": "gelu",
  | "text_config": {
  | "architectures": [
  | "LlamaForCausalLM"
  | ],
  | "eos_token_id": 32000,
  | "hidden_size": 3072,
  | "intermediate_size": 8192,
  | "max_position_embeddings": 4096,
  | "model_type": "llama",
  | "original_max_position_embeddings": 4096,
  | "pad_token_id": 32000,
  | "rms_norm_eps": 1e-05,
  | "sliding_window": 2048,
  | "torch_dtype": "float16",
  | "vocab_size": 32064
  | },
  | "torch_dtype": "float16",
  | "transformers_version": "4.40.1",
  | "vision_config": {
  | "architectures": [
  | "CLIPVisionModel"
  | ],
  | "dropout": 0.0,
  | "hidden_size": 1024,
  | "image_size": 336,
  | "intermediate_size": 4096,
  | "model_type": "clip_vision_model",
  | "num_attention_heads": 16,
  | "num_hidden_layers": 24,
  | "patch_size": 14,
  | "projection_dim": 768,
  | "torch_dtype": "float32"
  | },
  | "vision_feature_layer": -2,
  | "vision_feature_select_strategy": "default"
  | }

If you know how to call the right component where model config main architecture == 'llama', could you implement this? Then we can merge.

Blaizzy · 2024-04-26T15:18:09Z

Got it working with utils.py 714:723

    if image_processor is not None:
        prompt_tokens = mx.array(processor.encode(prompt))
        tokenizer = processor
    else:
        prompt_tokens = mx.array(processor.tokenizer.encode(prompt))
        tokenizer = processor.tokenizer
    
    if model.config.text_config.model_type == 'llama':
        processor.tokenizer.eos_token = "<|end|>"
        processor.tokenizer.pad_token = "<|end|>"

Great Job!

Let's keep everything as it was.

I would prefer if we patch the tokenizer directly and save it once we convert the model in the MLX hub :)

That way we don't have to add that extra logic here.

Blaizzy · 2024-04-26T15:20:45Z

Btw, I just added tests recently #13.

If you could add the test cases once you finish it would be great.

Blaizzy · 2024-04-26T15:28:12Z

Please don't forget to rename the folder to llavaPhi3

Blaizzy · 2024-04-29T12:34:16Z

Closes #11

Blaizzy · 2024-04-30T21:29:39Z

Hey @s-smits,

Thank you very much for your contributions!

During my tests I found that this model uses the existing llava implementation. I will close this PR.

You can use the already pre-quantized model in the hub:

Just install the latest version:
pip install -U mlx-vlm

s-smits · 2024-05-01T16:41:17Z

You're welcome. Due to other projects I couldn't finish it in time, thank you for converting it.

s-smits added 3 commits April 26, 2024 00:23

Delete mlx_vlm/models/llavaPhiMini3/__pycache__ directory

4fdce5c

Update README.md

ad3eac5

Blaizzy mentioned this pull request Apr 25, 2024

[Feature Request] Add LLaVA-phi-3-mini and quantized version #11

Closed

Merge branch 'main' into main

51f0680

rename model folder

56f8fdc

Blaizzy closed this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding LLaVA-Phi-Mini-3 #12

Adding LLaVA-Phi-Mini-3 #12

s-smits commented Apr 25, 2024

Blaizzy commented Apr 25, 2024 •

edited

Loading

Blaizzy commented Apr 25, 2024

s-smits commented Apr 25, 2024 •

edited

Loading

Blaizzy commented Apr 25, 2024 •

edited

Loading

s-smits commented Apr 25, 2024

Blaizzy commented Apr 26, 2024 •

edited

Loading

s-smits commented Apr 26, 2024

Blaizzy commented Apr 26, 2024 •

edited

Loading

Blaizzy commented Apr 26, 2024 •

edited

Loading

Blaizzy commented Apr 26, 2024

Blaizzy commented Apr 29, 2024

Blaizzy commented Apr 30, 2024 •

edited

Loading

s-smits commented May 1, 2024

Adding LLaVA-Phi-Mini-3 #12

Adding LLaVA-Phi-Mini-3 #12

Conversation

s-smits commented Apr 25, 2024

Blaizzy commented Apr 25, 2024 • edited Loading

Blaizzy commented Apr 25, 2024

s-smits commented Apr 25, 2024 • edited Loading

Blaizzy commented Apr 25, 2024 • edited Loading

s-smits commented Apr 25, 2024

Blaizzy commented Apr 26, 2024 • edited Loading

s-smits commented Apr 26, 2024

Blaizzy commented Apr 26, 2024 • edited Loading

Blaizzy commented Apr 26, 2024 • edited Loading

Blaizzy commented Apr 26, 2024

Blaizzy commented Apr 29, 2024

Blaizzy commented Apr 30, 2024 • edited Loading

s-smits commented May 1, 2024

Blaizzy commented Apr 25, 2024 •

edited

Loading

s-smits commented Apr 25, 2024 •

edited

Loading

Blaizzy commented Apr 25, 2024 •

edited

Loading

Blaizzy commented Apr 26, 2024 •

edited

Loading

Blaizzy commented Apr 26, 2024 •

edited

Loading

Blaizzy commented Apr 26, 2024 •

edited

Loading

Blaizzy commented Apr 30, 2024 •

edited

Loading