Chat template error with MLX Community LLava models (moved from FastMLX) #51

stewartugelow · 2024-07-12T20:05:27Z

Continued from: https://github.com/Blaizzy/fastmlx/issues/6

When I tried this at the command line: "python -m mlx_vlm.chat_ui --model mlx-community/llava-1.5-7b-4bit", I get the same chat template errors with all of the following:

models--mlx-community--llava-1.5-7b-4bit
models--mlx-community--llava-llama-3-8b-v1_1-8bit
models--mlx-community--llava-phi-3-mini-4bit
models--mlx-community--llava-v1.6-mistral-7b-8bit

Logs:

mlx-community/llava-1.5-7b-4bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-1.5-7b-4bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 88820.56it/s]
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 30740.01it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 68759.08it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.

mlx-community/llava-v1.6-mistral-7b-8bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-v1.6-mistral-7b-8bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 10 files: 100%|████████████████████| 10/10 [00:00<00:00, 110960.42it/s]
Fetching 10 files: 100%|█████████████████████| 10/10 [00:00<00:00, 34865.37it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 10 files: 100%|████████████████████| 10/10 [00:00<00:00, 108942.96it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
^CKeyboard interruption in main thread... closing server.

mlx-community/llava-llama-3-8b-v1_1-8bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-llama-3-8b-v1_1-8bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 8 files: 100%|████████████████████████| 8/8 [00:00<00:00, 74731.47it/s]
Fetching 8 files: 100%|█████████████████████████| 8/8 [00:00<00:00, 9742.87it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 8 files: 100%|████████████████████████| 8/8 [00:00<00:00, 34344.35it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
^CKeyboard interruption in main thread... closing server.

mlx-community/llava-phi-3-mini-4bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-phi-3-mini-4bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
preprocessor_config.json: 100%|████████████████| 819/819 [00:00<00:00, 8.36MB/s]
added_tokens.json: 100%|███████████████████████| 978/978 [00:00<00:00, 19.6MB/s]
config.json: 100%|█████████████████████████| 1.33k/1.33k [00:00<00:00, 18.7MB/s]
special_tokens_map.json: 100%|█████████████████| 615/615 [00:00<00:00, 3.51MB/s]
model.safetensors.index.json: 100%|██████████| 129k/129k [00:00<00:00, 9.68MB/s]
tokenizer_config.json: 100%|███████████████| 8.45k/8.45k [00:00<00:00, 46.1MB/s]
tokenizer.model: 100%|███████████████████████| 500k/500k [00:00<00:00, 12.5MB/s]
tokenizer.json: 100%|██████████████████████| 1.85M/1.85M [00:00<00:00, 8.59MB/s]
model.safetensors: 100%|███████████████████| 2.47G/2.47G [00:57<00:00, 43.2MB/s]
Fetching 9 files: 100%|███████████████████████████| 9/9 [00:57<00:00,  6.41s/it]
Fetching 9 files: 100%|███████████████████████| 9/9 [00:00<00:00, 110054.62it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 27453.63it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
^CKeyboard interruption in main thread... closing server.

The text was updated successfully, but these errors were encountered:

Blaizzy · 2024-07-15T12:42:54Z

Hey @stewartugelow

I still haven't managed to replicate your issue.

Here is what I did:

Reinstall mlx-vlm v0.0.11
remove all cached models
download model weights and run them

And still works as expected on two difference machines.

mlx-community/llava-phi-3-mini-4bit

mlx-community/llava-llama-3-8b-v1_1-8bit

mlx-community/llava-1.5-7b-4bit

Blaizzy · 2024-07-15T12:44:28Z

Pip list | grep mlx

fastmlx                                   0.1.0
mlx                                       0.15.2
mlx-lm                                    0.16.0            /Users/prince_canuma/Documents/Projects/LLMs/mlx-lm/llms
mlx-vlm                                   0.0.11

Blaizzy · 2024-07-15T13:35:38Z

@stewartugelow could you share the output of:

from mlx_vlm.utils import load

model_path = "mlx-community/llava-phi-3-mini-4bit"
model, processor = load(model_path)
print(processor.__dict__)

and

prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": f"<image>What are these?"}],
    tokenize=False,
    add_generation_prompt=True,
)
print(prompt)

BoltzmannEntropy · 2024-07-31T18:37:55Z

I have a similar issue:

python -m mlx_vlm.chat_ui --model mlx-community/Bunny-Llama-3-8B-V-8bit
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 89030.04it/s]
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 23801.22it/s]
Traceback (most recent call last):
  File "/path/to/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/path/to/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/path/to/python3.10/site-packages/mlx_vlm/chat_ui.py", line 35, in <module>
    model, processor = load(args.model, {"trust_remote_code": True})
  File "/path/to/python3.10/site-packages/mlx_vlm/utils.py", line 244, in load
    model = load_model(model_path, lazy)
  File "/path/to/python3.10/site-packages/mlx_vlm/utils.py", line 153, in load_model
    text_config = AutoConfig.from_pretrained(config["language_model"])
KeyError: 'language_model'

Blaizzy · 2024-08-01T12:28:32Z

@BoltzmannEntropy could you share the version of mlx-vlm you aer running?

BoltzmannEntropy · 2024-08-01T14:52:24Z

Sure:

sol@mprox dev % pip freeze | grep mlx                                            
mlx==0.16.1
mlx-lm==0.16.1
mlx-vlm==0.0.11

Blaizzy · 2024-08-01T15:25:32Z

@BoltzmannEntropy the problem is fixed. It was a missing key in the config :)

BoltzmannEntropy · 2024-08-01T16:07:12Z

While this works:
huggingface-cli download --local-dir Bunny-Llama-3-8B-V-8bit mlx-community/Bunny-Llama-3-8B-V-8bit
The command:
python -m mlx_vlm.chat_ui --model mlx-community/Bunny-Llama-3-8B-V-8bit
Produces:

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66abb222-0a5ce8e526d64b151b4fab07;dbf7c596-05f5-4c69-8460-c66147d36261)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.

I never had to authenticate before

Blaizzy · 2024-08-01T17:05:08Z

It's not a bug.

You have to access the model config of a gated model :)
Just got to the repo on HF meta-llama/Meta-Llama-3-8B and request access.

BoltzmannEntropy · 2024-08-01T17:19:26Z

sol@mprox dev % python -m mlx_vlm.chat_ui --model mlx-community/llava-phi-3-mini-4bit  
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 53242.22it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 73298.52it/s]
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 190650.18it/s]
Traceback (most recent call last):
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/mlx_vlm/chat_ui.py", line 133, in <module>
    demo = gr.ChatInterface(
TypeError: ChatInterface.__init__() got an unexpected keyword argument 'additional_inputs_accordion'

Blaizzy · 2024-08-01T17:35:00Z

This is a gradio problem. They had many breaking changes recently.

I will fix it on the next release tomorrow and pin the version to avoid such cases.

Blaizzy · 2024-08-01T17:35:45Z

@stewartugelow I managed to replicate your issue as well. And will address it :

Blaizzy · 2024-08-01T17:58:51Z

@stewartugelow @BoltzmannEntropy
Could you guys update to the latest gradio, install this PR #54 from source and give it a try to see if it fixes your issues?

Blaizzy · 2024-08-01T17:59:24Z

Update gradio:

pip install -U gradio

BoltzmannEntropy · 2024-08-01T19:23:34Z

dev % python -m mlx_vlm.chat_ui --model mlx-community/llava-phi-3-mini-4bit
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 136770.78it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 85598.04it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 15051.33it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/route_utils.py", line 285, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 768, in asyncgen_wrapper
    response = await iterator.__anext__()
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/chat_interface.py", line 652, in _stream_fn
    first_response = await async_iteration(generator)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/mlx_vlm/chat_ui.py", line 96, in chat
    if len(message["files"]) >= 1:
TypeError: 'MultimodalData' object is not subscriptable

Blaizzy · 2024-08-01T19:56:00Z

You didn't install from source.

To install from source, first clone the branch then run

pip install -e .

stewartugelow mentioned this issue Jul 12, 2024

No chat template specified for llava models error arcee-ai/fastmlx#6

Open

Blaizzy mentioned this issue Aug 1, 2024

Fix gradio app generation #54

Merged

Blaizzy closed this as completed in #54 Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat template error with MLX Community LLava models (moved from FastMLX) #51

Chat template error with MLX Community LLava models (moved from FastMLX) #51

stewartugelow commented Jul 12, 2024

Blaizzy commented Jul 15, 2024

Blaizzy commented Jul 15, 2024

Blaizzy commented Jul 15, 2024

BoltzmannEntropy commented Jul 31, 2024 •

edited

Loading

Blaizzy commented Aug 1, 2024

BoltzmannEntropy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

BoltzmannEntropy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

BoltzmannEntropy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024 •

edited

Loading

Blaizzy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

BoltzmannEntropy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

Chat template error with MLX Community LLava models (moved from FastMLX) #51

Chat template error with MLX Community LLava models (moved from FastMLX) #51

Comments

stewartugelow commented Jul 12, 2024

Logs:

mlx-community/llava-1.5-7b-4bit

mlx-community/llava-v1.6-mistral-7b-8bit

mlx-community/llava-llama-3-8b-v1_1-8bit

mlx-community/llava-phi-3-mini-4bit

Blaizzy commented Jul 15, 2024

mlx-community/llava-phi-3-mini-4bit

mlx-community/llava-llama-3-8b-v1_1-8bit

mlx-community/llava-1.5-7b-4bit

Blaizzy commented Jul 15, 2024

Blaizzy commented Jul 15, 2024

BoltzmannEntropy commented Jul 31, 2024 • edited Loading

Blaizzy commented Aug 1, 2024

BoltzmannEntropy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

BoltzmannEntropy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

BoltzmannEntropy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024 • edited Loading

Blaizzy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

BoltzmannEntropy commented Aug 1, 2024

Blaizzy commented Aug 1, 2024

BoltzmannEntropy commented Jul 31, 2024 •

edited

Loading

Blaizzy commented Aug 1, 2024 •

edited

Loading