Skip to content

Latest commit

 

History

History
113 lines (95 loc) · 4.5 KB

README.md

File metadata and controls

113 lines (95 loc) · 4.5 KB

lmstudio + MLX

mlx-engine - Apple MLX LLM Engine for LM Studio


Discord

mlx-engine

MLX engine for LM Studio


Built with

  • mlx-lm - Apple MLX inference engine (MIT)
  • Outlines - Structured output for LLMs (Apache 2.0)
  • mlx-vlm - Vision model inferencing for MLX (MIT)

How to use in LM Studio

LM Studio 0.3.4 and newer for Mac ships pre-bundled with mlx-engine. Download LM Studio from here


Standalone Demo

Prerequisites

  • macOS 14.0 (Sonoma) or greater.
  • python3.11
    • The requirements.txt file is compiled specifically for python3.11. python3.11 is the python version bundled within the LM Studio MLX runtime
    • brew install [email protected] is a quick way to add python3.11 to your path that doesn't break your default python setup

Install Steps

To run a demo of model load and inference:

  1. Clone the repository
git clone https://github.com/lmstudio-ai/mlx-engine.git
cd mlx-engine
  1. Create a virtual environment (optional)
 python3.11 -m venv .venv
 source .venv/bin/activate
  1. Install the required dependency packages
pip install -U -r requirements.txt

Text Model Demo

Download models with the lms CLI tool. The lms CLI documentation can be found here: https://lmstudio.ai/docs/cli Run the demo.py script with an MLX text generation model:

lms get mlx-community/Meta-Llama-3.1-8B-Instruct-4bit
python demo.py --model ~/.cache/lm-studio/models/mlx-community/Meta-Llama-3.1-8B-Instruct-4bit 

mlx-community/Meta-Llama-3.1-8B-Instruct-4bit - 4.53 GB

This command will use a default prompt that is formatted for Llama-3.1. For other models, add a custom --prompt argument with the correct prompt formatting:

lms get mlx-community/Mistral-Small-Instruct-2409-4bit
python demo.py --model ~/.cache/lm-studio/models/mlx-community/Mistral-Small-Instruct-2409-4bit --prompt "<s> [INST] How long will it take for an apple to fall from a 10m tree? [/INST]"

mlx-community/Mistral-Small-Instruct-2409-4bit - 12.52 GB

Vision Model Demo

Run the demo.py script with an MLX vision model:

lms get mlx-community/pixtral-12b-4bit
python demo.py --model ~/.cache/lm-studio/models/mlx-community/pixtral-12b-4bit --prompt "<s>[INST]Compare these images[IMG][IMG][/INST]" --images demo-data/chameleon.webp demo-data/toucan.jpeg

Currently supported vision models include:

  • Llama-3.2-Vision
    • lms get mlx-community/Llama-3.2-11B-Vision-Instruct-4bit
  • Pixtral
    • lms get mlx-community/pixtral-12b-4bit
  • Qwen2-VL
    • lms get mlx-community/Qwen2-VL-7B-Instruct-4bit
  • Llava-v1.6
    • lms get mlx-community/llava-v1.6-mistral-7b-4bit

Speculative Decoding Demo

Run the demo.py script with an MLX text generation model and a compatible --draft-model

lms get mlx-community/Qwen2.5-7B-Instruct-4bit
lms get lmstudio-community/Qwen2.5-0.5B-Instruct-MLX-8bit
python demo.py \
    --model ~/.lmstudio/models/mlx-community/Qwen2.5-7B-Instruct-4bit \
    --draft-model ~/.lmstudio/models/lmstudio-community/Qwen2.5-0.5B-Instruct-MLX-8bit \
    --prompt "<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Write a quick sort algorithm in C++<|im_end|>
<|im_start|>assistant
"

Testing

To run tests, run the following command from the root of this repo:

python -m unittest discover tests

To test specific vision models:

python -m unittest tests/test_vision_models.py -k pixtral