forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Core] Support image processor (vllm-project#4197)
- Loading branch information
1 parent
dfbe60d
commit 7a64d24
Showing
29 changed files
with
1,042 additions
and
256 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
Multi-Modality | ||
============== | ||
|
||
.. currentmodule:: vllm.multimodal | ||
|
||
vLLM provides experimental support for multi-modal models through the :mod:`vllm.multimodal` package. | ||
|
||
:class:`vllm.inputs.PromptStrictInputs` accepts an additional attribute ``multi_modal_data`` | ||
which allows you to pass in multi-modal input alongside text and token prompts. | ||
|
||
By default, vLLM models do not support multi-modal inputs. To enable multi-modal support for a model, | ||
you must decorate the model class with :meth:`MULTIMODAL_REGISTRY.register_dummy_data <MultiModalRegistry.register_dummy_data>`, | ||
as well as :meth:`MULTIMODAL_REGISTRY.register_input <MultiModalRegistry.register_input>` for each modality type to support. | ||
|
||
.. contents:: | ||
:local: | ||
:backlinks: none | ||
|
||
Module Contents | ||
+++++++++++++++ | ||
|
||
.. automodule:: vllm.multimodal | ||
|
||
Registry | ||
-------- | ||
|
||
.. data:: vllm.multimodal.MULTIMODAL_REGISTRY | ||
|
||
The global :class:`MultiModalRegistry` which is used by model runners. | ||
|
||
.. autoclass:: vllm.multimodal.MultiModalRegistry | ||
:members: | ||
:show-inheritance: | ||
|
||
Base Classes | ||
------------ | ||
|
||
.. autoclass:: vllm.multimodal.MultiModalData | ||
:members: | ||
:show-inheritance: | ||
|
||
.. autoclass:: vllm.multimodal.MultiModalPlugin | ||
:members: | ||
:show-inheritance: | ||
|
||
Image Classes | ||
------------- | ||
|
||
.. automodule:: vllm.multimodal.image | ||
:members: | ||
:show-inheritance: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
.. _vlm: | ||
|
||
Using VLMs | ||
========== | ||
|
||
This document shows you how to run and serve Vision Language Models (VLMs) using vLLM. | ||
|
||
Engine Arguments | ||
---------------- | ||
|
||
The following :ref:`engine arguments <engine_args>` are specific to VLMs: | ||
|
||
.. argparse:: | ||
:module: vllm.engine.arg_utils | ||
:func: _vlm_engine_args_parser | ||
:prog: -m vllm.entrypoints.openai.api_server | ||
:nodefaultconst: | ||
|
||
Offline Batched Inference | ||
------------------------- | ||
|
||
To initialize a VLM, the aforementioned arguments must be passed to the ``LLM`` class for instantiating the engine. | ||
|
||
.. code-block:: python | ||
llm = LLM( | ||
model="llava-hf/llava-1.5-7b-hf", | ||
image_input_type="pixel_values", | ||
image_token_id=32000, | ||
image_input_shape="1,3,336,336", | ||
image_feature_size=576, | ||
) | ||
For now, we only support a single image per text prompt. To pass an image to the model, note the following in :class:`vllm.inputs.PromptStrictInputs`: | ||
|
||
* ``prompt``: The prompt should have a number of ``<image>`` tokens equal to ``image_feature_size``. | ||
* ``multi_modal_data``: This should be an instance of :class:`~vllm.multimodal.image.ImagePixelData` or :class:`~vllm.multimodal.image.ImageFeatureData`. | ||
|
||
.. code-block:: python | ||
prompt = "<image>" * 576 + ( | ||
"\nUSER: What is the content of this image?\nASSISTANT:") | ||
# Load the image using PIL.Image | ||
image = ... | ||
outputs = llm.generate({ | ||
"prompt": prompt, | ||
"multi_modal_data": ImagePixelData(image), | ||
}) | ||
for o in outputs: | ||
generated_text = o.outputs[0].text | ||
print(generated_text) | ||
A code example can be found in `examples/llava_example.py <https://github.com/vllm-project/vllm/blob/main/examples/llava_example.py>`_. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.