You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Llama 3.x multimodal support for evaluations and benchmarking
## Change Log
- add Llama 3.2 Vision image input support in utils prompt generation and benchmarking script
- add MMMU with supprt from https://github.com/tstescoTT/lm-evaluation-harness/tree/tstesco/add-local-multimodal
- address #73 in run_vllm_api_server.py::ensure_mesh_device MESH_DEVICE handling
- fix#62 with run_vllm_api_server.py::register_vllm_models
- rename batch_size -> max_concurrent in client side scripts to indicate that they only set maximum concurrent requests and not the actual model batch size
- adding missing empty line at end of shell scripts
It's easy to misconfigure MESH_DEVICE environment variable and difficult for users to debug. Each model implementation may have different config for:
Some explanation in https://github.com/tenstorrent/vllm/tree/dev/tt_metal#running-the-offline-inference-example
The model configs for tt-metal Llama 3.x model implementations use MESH_DEVICE to set key mesh_device settings:https://github.com/tenstorrent/tt-metal/blob/main/models/demos/llama3/tt/model_config.py#L84
The text was updated successfully, but these errors were encountered: