Phi 3.5 vision (4B model) #637

CheeseAndMeat · 2024-10-08T17:42:30Z

Model description

Lorax's official supported models does not list any vision model. This is a big gap for a very successful product.
Having lorax a critical component in our tech stack without clear option of image-based language models is a big risk on our end. Can the Lorax team please prioritize on-boarding Phi3.5 vision, state of the art SML with vision? Appreciated.

https://huggingface.co/microsoft/Phi-3.5-vision-instruct

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

No response

tgaddair · 2024-10-08T20:19:34Z

Hi @CheeseAndMeat, thanks for raising this issue. There are two things here for us to do:

Add support for Phi 3.5 Vision, which we can certainly do
Update our docs for VLMs, as we do now support both Llava Next and Llama 3.2 Vision models

CheeseAndMeat · 2024-10-08T21:58:21Z

@tgaddair I really appreciate the prompt follow-up :)
1- Phi3.5 Vision outperformed LLMama3.2 Vision in our testing... We are really impressed with it!
2- Same for Phi3.5 MOE, it is much better than both Mixtral & llama3.2, would be great to have it in the roadmap as well.
Thanks again!

tgaddair added the enhancement New feature or request label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phi 3.5 vision (4B model) #637

Phi 3.5 vision (4B model) #637

CheeseAndMeat commented Oct 8, 2024

tgaddair commented Oct 8, 2024

CheeseAndMeat commented Oct 8, 2024

Phi 3.5 vision (4B model) #637

Phi 3.5 vision (4B model) #637

Comments

CheeseAndMeat commented Oct 8, 2024

Model description

Open source status

Provide useful links for the implementation

tgaddair commented Oct 8, 2024

CheeseAndMeat commented Oct 8, 2024