Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat template error with MLX Community LLava models (moved from FastMLX) #51

Closed
stewartugelow opened this issue Jul 12, 2024 · 16 comments · Fixed by #54
Closed

Chat template error with MLX Community LLava models (moved from FastMLX) #51

stewartugelow opened this issue Jul 12, 2024 · 16 comments · Fixed by #54

Comments

@stewartugelow
Copy link

Continued from: https://github.com/Blaizzy/fastmlx/issues/6


When I tried this at the command line: "python -m mlx_vlm.chat_ui --model mlx-community/llava-1.5-7b-4bit", I get the same chat template errors with all of the following:

models--mlx-community--llava-1.5-7b-4bit
models--mlx-community--llava-llama-3-8b-v1_1-8bit
models--mlx-community--llava-phi-3-mini-4bit
models--mlx-community--llava-v1.6-mistral-7b-8bit


Logs:

mlx-community/llava-1.5-7b-4bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-1.5-7b-4bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 88820.56it/s]
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 30740.01it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 68759.08it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.

mlx-community/llava-v1.6-mistral-7b-8bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-v1.6-mistral-7b-8bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 10 files: 100%|████████████████████| 10/10 [00:00<00:00, 110960.42it/s]
Fetching 10 files: 100%|█████████████████████| 10/10 [00:00<00:00, 34865.37it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 10 files: 100%|████████████████████| 10/10 [00:00<00:00, 108942.96it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
^CKeyboard interruption in main thread... closing server.

mlx-community/llava-llama-3-8b-v1_1-8bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-llama-3-8b-v1_1-8bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 8 files: 100%|████████████████████████| 8/8 [00:00<00:00, 74731.47it/s]
Fetching 8 files: 100%|█████████████████████████| 8/8 [00:00<00:00, 9742.87it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 8 files: 100%|████████████████████████| 8/8 [00:00<00:00, 34344.35it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
^CKeyboard interruption in main thread... closing server.

mlx-community/llava-phi-3-mini-4bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-phi-3-mini-4bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
preprocessor_config.json: 100%|████████████████| 819/819 [00:00<00:00, 8.36MB/s]
added_tokens.json: 100%|███████████████████████| 978/978 [00:00<00:00, 19.6MB/s]
config.json: 100%|█████████████████████████| 1.33k/1.33k [00:00<00:00, 18.7MB/s]
special_tokens_map.json: 100%|█████████████████| 615/615 [00:00<00:00, 3.51MB/s]
model.safetensors.index.json: 100%|██████████| 129k/129k [00:00<00:00, 9.68MB/s]
tokenizer_config.json: 100%|███████████████| 8.45k/8.45k [00:00<00:00, 46.1MB/s]
tokenizer.model: 100%|███████████████████████| 500k/500k [00:00<00:00, 12.5MB/s]
tokenizer.json: 100%|██████████████████████| 1.85M/1.85M [00:00<00:00, 8.59MB/s]
model.safetensors: 100%|███████████████████| 2.47G/2.47G [00:57<00:00, 43.2MB/s]
Fetching 9 files: 100%|███████████████████████████| 9/9 [00:57<00:00,  6.41s/it]
Fetching 9 files: 100%|███████████████████████| 9/9 [00:00<00:00, 110054.62it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 27453.63it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
^CKeyboard interruption in main thread... closing server.
@Blaizzy
Copy link
Owner

Blaizzy commented Jul 15, 2024

Hey @stewartugelow

I still haven't managed to replicate your issue.

Here is what I did:

  1. Reinstall mlx-vlm v0.0.11
  2. remove all cached models
  3. download model weights and run them

And still works as expected on two difference machines.

mlx-community/llava-phi-3-mini-4bit

Screenshot 2024-07-15 at 2 10 01 PM

mlx-community/llava-llama-3-8b-v1_1-8bit

Screenshot 2024-07-15 at 1 35 12 PM

mlx-community/llava-1.5-7b-4bit

Screenshot 2024-07-15 at 2 10 01 PM

@Blaizzy
Copy link
Owner

Blaizzy commented Jul 15, 2024

Pip list | grep mlx

fastmlx                                   0.1.0
mlx                                       0.15.2
mlx-lm                                    0.16.0            /Users/prince_canuma/Documents/Projects/LLMs/mlx-lm/llms
mlx-vlm                                   0.0.11

@Blaizzy
Copy link
Owner

Blaizzy commented Jul 15, 2024

@stewartugelow could you share the output of:

from mlx_vlm.utils import load

model_path = "mlx-community/llava-phi-3-mini-4bit"
model, processor = load(model_path)
print(processor.__dict__)

and

prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": f"<image>What are these?"}],
    tokenize=False,
    add_generation_prompt=True,
)
print(prompt)

@BoltzmannEntropy
Copy link

BoltzmannEntropy commented Jul 31, 2024

I have a similar issue:

python -m mlx_vlm.chat_ui --model mlx-community/Bunny-Llama-3-8B-V-8bit
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 89030.04it/s]
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 23801.22it/s]
Traceback (most recent call last):
  File "/path/to/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/path/to/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/path/to/python3.10/site-packages/mlx_vlm/chat_ui.py", line 35, in <module>
    model, processor = load(args.model, {"trust_remote_code": True})
  File "/path/to/python3.10/site-packages/mlx_vlm/utils.py", line 244, in load
    model = load_model(model_path, lazy)
  File "/path/to/python3.10/site-packages/mlx_vlm/utils.py", line 153, in load_model
    text_config = AutoConfig.from_pretrained(config["language_model"])
KeyError: 'language_model'

@Blaizzy
Copy link
Owner

Blaizzy commented Aug 1, 2024

@BoltzmannEntropy could you share the version of mlx-vlm you aer running?

@BoltzmannEntropy
Copy link

Sure:

sol@mprox dev % pip freeze | grep mlx                                            
mlx==0.16.1
mlx-lm==0.16.1
mlx-vlm==0.0.11

@Blaizzy
Copy link
Owner

Blaizzy commented Aug 1, 2024

@BoltzmannEntropy the problem is fixed. It was a missing key in the config :)

@BoltzmannEntropy
Copy link

While this works:
huggingface-cli download --local-dir Bunny-Llama-3-8B-V-8bit mlx-community/Bunny-Llama-3-8B-V-8bit
The command:
python -m mlx_vlm.chat_ui --model mlx-community/Bunny-Llama-3-8B-V-8bit
Produces:

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66abb222-0a5ce8e526d64b151b4fab07;dbf7c596-05f5-4c69-8460-c66147d36261)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.

I never had to authenticate before

@Blaizzy
Copy link
Owner

Blaizzy commented Aug 1, 2024

It's not a bug.

You have to access the model config of a gated model :)
Just got to the repo on HF meta-llama/Meta-Llama-3-8B and request access.

@BoltzmannEntropy
Copy link

sol@mprox dev % python -m mlx_vlm.chat_ui --model mlx-community/llava-phi-3-mini-4bit  
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 53242.22it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 73298.52it/s]
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 190650.18it/s]
Traceback (most recent call last):
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/mlx_vlm/chat_ui.py", line 133, in <module>
    demo = gr.ChatInterface(
TypeError: ChatInterface.__init__() got an unexpected keyword argument 'additional_inputs_accordion'

@Blaizzy
Copy link
Owner

Blaizzy commented Aug 1, 2024

This is a gradio problem. They had many breaking changes recently.

I will fix it on the next release tomorrow and pin the version to avoid such cases.

@Blaizzy
Copy link
Owner

Blaizzy commented Aug 1, 2024

@stewartugelow I managed to replicate your issue as well. And will address it :

@Blaizzy
Copy link
Owner

Blaizzy commented Aug 1, 2024

@stewartugelow @BoltzmannEntropy
Could you guys update to the latest gradio, install this PR #54 from source and give it a try to see if it fixes your issues?

@Blaizzy
Copy link
Owner

Blaizzy commented Aug 1, 2024

Update gradio:

pip install -U gradio

@BoltzmannEntropy
Copy link

dev % python -m mlx_vlm.chat_ui --model mlx-community/llava-phi-3-mini-4bit
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 136770.78it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 85598.04it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 15051.33it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/route_utils.py", line 285, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 768, in asyncgen_wrapper
    response = await iterator.__anext__()
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/chat_interface.py", line 652, in _stream_fn
    first_response = await async_iteration(generator)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/mlx_vlm/chat_ui.py", line 96, in chat
    if len(message["files"]) >= 1:
TypeError: 'MultimodalData' object is not subscriptable

@Blaizzy
Copy link
Owner

Blaizzy commented Aug 1, 2024

You didn't install from source.

To install from source, first clone the branch then run

pip install -e .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants