MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research.
MLX is designed by machine learning researchers for machine learning researchers. The framework is intended to be user-friendly, but still efficient to train and deploy models. The design of the framework itself is also conceptually simple. We intend to make it easy for researchers to extend and improve MLX with the goal of quickly exploring new ideas.
LLMs can be accelerated in Apple Silicon devices through MLX, and models can be run locally very conveniently.
- Python 3.11.x
- Install MLX Library
pip install mlx-lm
python -m mlx_lm.generate --model microsoft/Phi-3-mini-4k-instruct --max-token 2048 --prompt "<|user|>\nCan you introduce yourself<|end|>\n<|assistant|>"
The result (my env is Apple M1 Max,64GB) is
python -m mlx_lm.convert --hf-path microsoft/Phi-3-mini-4k-instruct
Note: The model can be quantized through mlx_lm.convert, and the default quantization is INT4. This example quantizes Phi-3-mini to INT4
The model can be quantized through mlx_lm.convert, and the default quantization is INT4. This example is to quantize Phi-3-mini into INT4. After quantization, it will be stored in the default directory ./mlx_model
We can test the model quantized with MLX from terminal
python -m mlx_lm.generate --model ./mlx_model/ --max-token 2048 --prompt "<|user|>\nCan you introduce yourself<|end|>\n<|assistant|>"
The result is
Note: Please read this sample click this link
-
Learn about Apple MLX Framework https://ml-explore.github.io
-
Apple MLX GitHub Repo https://github.com/ml-explore