Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example notebooks and support for system role #95

Merged
merged 4 commits into from
Oct 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
**.ipynb
test.ipynb
**.pyc

Byte-compiled / optimized / DLL files
Expand Down
Binary file added examples/images/cats.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/images/desktop_setup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/images/graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/images/handwritten_hard.webp
Binary file not shown.
Binary file added examples/images/latex.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/images/paper.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
281 changes: 281 additions & 0 deletions examples/multi_image_generation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multi-Image Generation\n",
"\n",
"In this example, you will learn how to generate text from multiple images using the supported models: `Qwen2-VL`, `Pixtral` and `llava-interleaved`.\n",
"\n",
"Multi-image generation allows you to pass a list of images to the model and generate text conditioned on all the images.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from mlx_vlm import load, apply_chat_template, generate\n",
"from mlx_vlm.utils import load_image\n",
"from mlx_vlm.utils import process_image"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"images = [\"images/cats.jpg\", \"images/desktop_setup.png\"]\n",
"\n",
"messages = [\n",
" {\"role\": \"user\", \"content\": \"Describe what you see in the images.\"}\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Qwen2-VL"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load model and processor\n",
"qwen_vl_model, qwen_vl_processor = load(\"mlx-community/Qwen2-VL-7B-Instruct-4bit\")\n",
"qwen_vl_config = qwen_vl_model.config"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"prompt = apply_chat_template(qwen_vl_processor, qwen_vl_config, messages, num_images=len(images))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"==========\n",
"Image: ['images/cats.jpg', 'images/desktop_setup.png'] \n",
"\n",
"Prompt: <|im_start|>system\n",
"You are a helpful assistant.<|im_end|>\n",
"<|im_start|>user\n",
"Describe what you see in the images.<|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|im_end|>\n",
"<|im_start|>assistant\n",
"\n",
"The image shows a cozy home office setup with a pink blanket covering the desk and chair. There are two cats lounging on the blanket, one on the left side and the other on the right side of the desk. The desk has a computer monitor, keyboard, and mouse. There are also speakers on either side of the monitor, a remote control, and a small plant on the left side of the desk. The wall behind the desk has a framed text that says, \"Don't grow up, it's a trap.\" The overall scene is playful and relaxed, with the cats adding a touch of whimsy to the workspace.\n",
"==========\n",
"Prompt: 10.047 tokens-per-sec\n",
"Generation: 28.392 tokens-per-sec\n"
]
}
],
"source": [
"qwen_vl_output = generate(\n",
" qwen_vl_model,\n",
" qwen_vl_processor,\n",
" images,\n",
" prompt,\n",
" max_tokens=1000,\n",
" temperature=0.7,\n",
" verbose=True\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pixtral"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load model and processor\n",
"pixtral_model, pixtral_processor = load(\"mlx-community/pixtral-12b-4bit\")\n",
"pixtral_config = pixtral_model.config"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"\n",
"prompt = apply_chat_template(pixtral_processor, pixtral_config, messages, num_images=len(images))"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"# Pixtral requires images to be resized to the same shape in multi-image generation\n",
"resized_images = [process_image(load_image(image), (560, 560), None) for image in images]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"==========\n",
"Image: [<PIL.Image.Image image mode=RGB size=560x420 at 0x3ACDD5FF0>, <PIL.Image.Image image mode=RGB size=560x347 at 0x39D697820>] \n",
"\n",
"Prompt: <s>[INST]Describe what you see in the images.[IMG][IMG][/INST]\n",
"The first image shows two cats lying on a pink couch. One cat is on the left side, and the other is on the right side. Both cats appear to be relaxed and comfortable. There are two remote controls on the couch, one near each cat. The background of the image is a plain, light-colored wall.\n",
"\n",
"The second image depicts a home office setup. The main elements include:\n",
"\n",
"1. A wooden desk with a computer monitor in the center.\n",
"2. Two black speakers on either side of the monitor.\n",
"3. A black office chair in front of the desk.\n",
"4. A wooden shelf on the left side of the desk, holding various items including records and a potted plant.\n",
"5. A framed poster on the wall behind the desk with the text \"Don't Grow Up, It's a Trap.\"\n",
"6. A drum set on the right side of the image, partially visible.\n",
"\n",
"The overall setting of the second image is a modern, minimalist workspace with personal touches.\n",
"==========\n",
"Prompt: 2.203 tokens-per-sec\n",
"Generation: 24.366 tokens-per-sec\n"
]
}
],
"source": [
"pixtral_output = generate(\n",
" pixtral_model,\n",
" pixtral_processor,\n",
" resized_images,\n",
" prompt,\n",
" max_tokens=1000,\n",
" temperature=0.7,\n",
" verbose=True\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Llava-Interleaved"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load model and processor\n",
"llava_model, llava_processor = load(\"mlx-community/llava-interleave-qwen-0.5b-bf16\")\n",
"llava_config = llava_model.config"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"prompt = apply_chat_template(llava_processor, llava_config, messages, num_images=len(images))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Expanding inputs for image tokens in LLaVa should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"==========\n",
"Image: ['images/cats.jpg', 'images/desktop_setup.png'] \n",
"\n",
"Prompt: <|im_start|>user\n",
"<image><image>\n",
"Describe what you see in the images.<|im_end|>\n",
"<|im_start|>assistant\n",
"\n",
"The image captures a cozy scene in a room. Two cats, one gray and the other brown and white, are lying on a pink couch. The gray cat is resting its head on the back of the couch, while the brown and white cat is lying on its side. They are both facing the camera, their relaxed postures suggesting a sense of comfort and tranquility.\n",
"\n",
"The room itself is a study in simplicity. A whiteboard hangs on the wall, a black computer monitor sits on a wooden desk, and a black speaker stands tall on a shelf. The walls are painted a light pink, providing a warm and inviting backdrop to the scene.\n",
"\n",
"On the desk, there's a black keyboard and a white mouse, ready for use. A plant sits on a small table next to the desk, adding a touch of nature to the room. A whiteboard eraser is also present on the desk, perhaps used for cleaning or marking.\n",
"\n",
"Overall, the image paints a picture of a comfortable and well-organized living space, where the cats have found their place and are enjoying their time.\n",
"==========\n",
"Prompt: 28.728 tokens-per-sec\n",
"Generation: 52.317 tokens-per-sec\n"
]
}
],
"source": [
"llava_output = generate(\n",
" llava_model,\n",
" llava_processor,\n",
" images,\n",
" prompt,\n",
" max_tokens=1000,\n",
" temperature=0.7,\n",
" verbose=True\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "mlx_code",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
424 changes: 424 additions & 0 deletions examples/object_detection.ipynb

Large diffs are not rendered by default.

Loading
Loading