title | thumbnail | authors | |||
---|---|---|---|---|---|
Using Stable Diffusion with Core ML on Apple Silicon |
/blog/assets/diffusers_coreml/thumbnail.png |
|
Thanks to Apple engineers, you can now run Stable Diffusion on Apple Silicon using Core ML!
This Apple repo provides conversion scripts and inference code based on 🧨 Diffusers, and we love it! To make it as easy as possible for you, we converted the weights ourselves and put the Core ML versions of the models in the Hugging Face Hub.
Update: some weeks after this post was written we created a native Swift app that you can use to run Stable Diffusion effortlessly on your own hardware. We released an app in the Mac App Store as well as the source code to allow other projects to use it.
The rest of this post guides you on how to use the converted weights in your own code or convert additional weights yourself.
The official Stable Diffusion checkpoints are already converted and ready for use:
- Stable Diffusion v1.4: converted original
- Stable Diffusion v1.5: converted original
- Stable Diffusion v2 base: converted original
- Stable Diffusion v2.1 base: converted original
Core ML supports all the compute units available in your device: CPU, GPU and Apple's Neural Engine (NE). It's also possible for Core ML to run different portions of the model in different devices to maximize performance.
There are several variants of each model that may yield different performance depending on the hardware you use. We recommend you try them out and stick with the one that works best in your system. Read on for details.
There are several variants per model:
- "Original" attention vs "split_einsum". These are two alternative implementations of the critical attention blocks.
split_einsum
was previously introduced by Apple, and is compatible with all the compute units (CPU, GPU and Apple's Neural Engine).original
, on the other hand, is only compatible with CPU and GPU. Nevertheless,original
can be faster thansplit_einsum
on some devices, so do check it out! - "ML Packages" vs "Compiled" models. The former is suitable for Python inference, while the
compiled
version is required for Swift code. Thecompiled
models in the Hub split the large UNet model weights in several files for compatibility with iOS and iPadOS devices. This corresponds to the--chunk-unet
conversion option.
At the time of this writing, we got best results on my MacBook Pro (M1 Max, 32 GPU cores, 64 GB) using the following combination:
original
attention.all
compute units (see next section for details).- macOS Ventura 13.1 Beta 4 (22C5059b).
With these, it took 18s to generate one image with the Core ML version of Stable Diffusion v1.4 🤯.
⚠️ NoteSeveral improvements to Core ML were introduced in macOS Ventura 13.1, and they are required by Apple's implementation. You may get black images –and much slower times– if you use previous versions of macOS.
Each model repo is organized in a tree structure that provides these different variants:
coreml-stable-diffusion-v1-4
├── README.md
├── original
│ ├── compiled
│ └── packages
└── split_einsum
├── compiled
└── packages
You can download and use the variant you need as shown below.
pip install huggingface_hub
pip install git+https://github.com/apple/ml-stable-diffusion
To run inference in Python, you have to use one of the versions stored in the packages
folders, because the compiled ones are only compatible with Swift. You may choose whether you want to use the original
or split_einsum
attention styles.
This is how you'd download the original
attention variant from the Hub:
from huggingface_hub import snapshot_download
from pathlib import Path
repo_id = "apple/coreml-stable-diffusion-v1-4"
variant = "original/packages"
model_path = Path("./models") / (repo_id.split("/")[-1] + "_" + variant.replace("/", "_"))
snapshot_download(repo_id, allow_patterns=f"{variant}/*", local_dir=model_path, local_dir_use_symlinks=False)
print(f"Model downloaded at {model_path}")
The code above will place the downloaded model snapshot inside a directory called models
.
Once you have downloaded a snapshot of the model, the easiest way to run inference would be to use Apple's Python script.
python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i models/coreml-stable-diffusion-v1-4_original_packages -o </path/to/output/image> --compute-unit ALL --seed 93
<output-mlpackages-directory>
should point to the checkpoint you downloaded in the step above, and --compute-unit
indicates the hardware you want to allow for inference. It must be one of the following options: ALL
, CPU_AND_GPU
, CPU_ONLY
, CPU_AND_NE
. You may also provide an optional output path, and a seed for reproducibility.
The inference script assumes the original version of the Stable Diffusion model, stored in the Hub as CompVis/stable-diffusion-v1-4
. If you use another model, you have to specify its Hub id in the inference command-line, using the --model-version
option. This works both for models already supported, and for custom models you trained or fine-tuned yourself.
For Stable Diffusion 1.5 (Hub id: runwayml/stable-diffusion-v1-5
):
python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-v1-5_original_packages --model-version runwayml/stable-diffusion-v1-5
For Stable Diffusion 2 base (Hub id: stabilityai/stable-diffusion-2-base
):
python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base
Running inference in Swift is slightly faster than in Python, because the models are already compiled in the mlmodelc
format. This will be noticeable on app startup when the model is loaded, but shouldn’t be noticeable if you run several generations afterwards.
To run inference in Swift on your Mac, you need one of the compiled
checkpoint versions. We recommend you download them locally using Python code similar to the one we showed above, but using one of the compiled
variants:
from huggingface_hub import snapshot_download
from pathlib import Path
repo_id = "apple/coreml-stable-diffusion-v1-4"
variant = "original/compiled"
model_path = Path("./models") / (repo_id.split("/")[-1] + "_" + variant.replace("/", "_"))
snapshot_download(repo_id, allow_patterns=f"{variant}/*", local_dir=model_path, local_dir_use_symlinks=False)
print(f"Model downloaded at {model_path}")
To run inference, please clone Apple's repo:
git clone https://github.com/apple/ml-stable-diffusion
cd ml-stable-diffusion
And then use Apple's command-line tool using Swift Package Manager's facilities:
swift run StableDiffusionSample --resource-path models/coreml-stable-diffusion-v1-4_original_compiled --compute-units all "a photo of an astronaut riding a horse on mars"
You have to specify in --resource-path
one of the checkpoints downloaded in the previous step, so please make sure it contains compiled Core ML bundles with the extension .mlmodelc
. The --compute-units
has to be one of these values: all
, cpuOnly
, cpuAndGPU
, cpuAndNeuralEngine
.
For more details, please refer to the instructions in Apple's repo.
If you have created your own models compatible with Stable Diffusion (for example, if you used Dreambooth, Textual Inversion or fine-tuning), then you have to convert the models yourself. Fortunately, Apple provides a conversion script that allows you to do so.
For this task, we recommend you follow these instructions.
We are really excited about the opportunities this brings and can't wait to see what the community can create from here. Some potential ideas are:
- Native, high-quality apps for Mac, iPhone and iPad.
- Bring additional schedulers to Swift, for even faster inference.
- Additional pipelines and tasks.
- Explore quantization techniques and further optimizations.
Looking forward to seeing what you create!