You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Excellent work!
I've encountered a specific issue when attempting to load the 6b model using the following command: llm = LLM(model=model_name_or_dir, tensor_parallel_size=num_gpus)
where model_name_or_dir is a local path. Unfortunately, this resulted in a ZeroDivisionError. The detailed error message is as follows: File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/entrypoints/llm.py", line 93, in __init__ self.llm_engine = LLMEngine.from_engine_args(engine_args) File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 231, in from_engine_args engine = cls(*engine_configs, File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 113, in __init__ self._init_cache() File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 193, in _init_cache num_blocks = self._run_workers( File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 698, in _run_workers all_outputs = ray.get(all_outputs) File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper return fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2547, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ZeroDivisionError): �[36mray::RayWorker.execute_method()�[39m (pid=3707, ip=10.33.79.244, actor_id=71ea7c30788a6797b487792c01000000, repr=<vllm.engine.ray_utils.RayWorker object at 0x7fb32ad01030>) File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/ray_utils.py", line 32, in execute_method return executor(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/worker/worker.py", line 127, in profile_num_available_blocks (total_gpu_memory * gpu_memory_utilization - peak_memory) // ZeroDivisionError: float floor division by zero
However, when I switch to the 13b model, the model loads without any issues and it works normally. Any guidance or insights into this matter would be greatly appreciated. Thank you for your support and dedication to maintaining this project.
The text was updated successfully, but these errors were encountered:
Hi @huoliangyu! Thanks for your valuable feedback.
We've tried to replicate your issue, but everything seems to be working fine. It might be helpful if you could provide us with some more details, like your complete program, your environment etc.
Excellent work!
I've encountered a specific issue when attempting to load the 6b model using the following command:
llm = LLM(model=model_name_or_dir, tensor_parallel_size=num_gpus)
where model_name_or_dir is a local path. Unfortunately, this resulted in a ZeroDivisionError. The detailed error message is as follows:
File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/entrypoints/llm.py", line 93, in __init__ self.llm_engine = LLMEngine.from_engine_args(engine_args) File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 231, in from_engine_args engine = cls(*engine_configs, File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 113, in __init__ self._init_cache() File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 193, in _init_cache num_blocks = self._run_workers( File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 698, in _run_workers all_outputs = ray.get(all_outputs) File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper return fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2547, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ZeroDivisionError): �[36mray::RayWorker.execute_method()�[39m (pid=3707, ip=10.33.79.244, actor_id=71ea7c30788a6797b487792c01000000, repr=<vllm.engine.ray_utils.RayWorker object at 0x7fb32ad01030>) File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/engine/ray_utils.py", line 32, in execute_method return executor(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.2.0-py3.10-linux-x86_64.egg/vllm/worker/worker.py", line 127, in profile_num_available_blocks (total_gpu_memory * gpu_memory_utilization - peak_memory) // ZeroDivisionError: float floor division by zero
However, when I switch to the 13b model, the model loads without any issues and it works normally. Any guidance or insights into this matter would be greatly appreciated. Thank you for your support and dedication to maintaining this project.
The text was updated successfully, but these errors were encountered: