Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ChatUniVi with vLLM #62

Open
jerilkuriakose opened this issue Oct 23, 2024 · 0 comments
Open

Using ChatUniVi with vLLM #62

jerilkuriakose opened this issue Oct 23, 2024 · 0 comments

Comments

@jerilkuriakose
Copy link

Hi,

Thank you for this nice package. How can i use this vLLM or TGI or TRT-LLM?

While trying to use with vLLM I am getting the following error:

An error occurred: The checkpoint you are trying to load has model type `ChatUniVi` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

The following is the code that i am using:

from PIL import Image
import requests
from io import BytesIO

from vllm import LLM, SamplingParams


def run_llava(question: str, modality: str):
    assert modality == "image"

    prompt = f"USER: <image>\n{question}\nASSISTANT:"

    llm = LLM(model="/tmp/models/chatunivi_v5", max_model_len=4096)
    stop_token_ids = None
    return llm, prompt, stop_token_ids


def test_run_llava():
    """
    Test function for running LLaVA model inference
    """
    # 1. Test setup
    question = "What objects do you see in this image?"
    modality = "image"

    # 2. Get the LLM, prompt, and stop tokens
    llm, prompt, stop_token_ids = run_llava(question, modality)

    # 3. Create or load a test image
    # Option 1: Create a simple test image
    test_image = Image.new("RGB", (224, 224), color="red")

    # Option 2: Load a sample image from a URL (uncomment if needed)
    # url = "https://raw.githubusercontent.com/llava-org/llava-v1.5-7b/main/images/extreme_ironing.jpg"
    # response = requests.get(url)
    # test_image = Image.open(BytesIO(response.content))

    # 4. Set up sampling parameters
    sampling_params = SamplingParams(
        temperature=0.2, max_tokens=64, stop_token_ids=stop_token_ids
    )

    # 5. Prepare input for generation
    input_data = {"prompt": prompt, "multi_modal_data": {"image": test_image}}

    # 6. Generate response
    outputs = llm.generate(input_data, sampling_params=sampling_params)

    # 7. Print results
    for output in outputs:
        generated_text = output.outputs[0].text
        print(f"Question: {question}")
        print(f"Generated response: {generated_text}")
        print("-" * 50)


if __name__ == "__main__":
    try:
        test_run_llava()
    except Exception as e:
        print(f"An error occurred: {str(e)}")

Can anyone please help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant