You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-CPU: Intel Xeon Platinum 8352V (144) @ 3.500GHz X86
-Memory: 1031689MiB
-GPU:RTX-4090*8
-Librarys
tensorrt 10.7.0
tensorrt_cu12 10.7.0
tensorrt-cu12-bindings 10.7.0
tensorrt-cu12-libs 10.7.0
tensorrt-llm 0.16.0
nvidia driver version
Driver Version: 550.135 CUDA Version: 12.4
OS Ubuntu 22.04.5 LTS x86_64
Who can help?
No response
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I ran the trtllm-serve comman like:
trtllm-serve /home/lz/tensorrt/build/Qwen2.5-7B-Instructtrt_engines/weight_only/1-gpu
--tokenizer /home/lz/tensorrt/models/Qwen2.5-7B-Instruct
--max_batch_size 128 --max_num_tokens 4096 --max_seq_len 4096
--kv_cache_free_gpu_memory_fraction 0.95
But there is no output except the:
[TensorRT-LLM] TensorRT-LLM version: 0.16.0
No errors,no warnings no Port occupation
But it ran well with the test:
python3 /home/lz/TensorRT-LLM/examples/run.py --input_text "你好,请问你叫什么?"
--max_output_len=50
--tokenizer_dir /home/lz/tensorrt/models/Qwen2.5-7B-Instruct
--engine_dir=/home/lz/tensorrt/build/Qwen2.5-7B-Instructtrt_engines/weight_only/1-gpu
What Can I do to run an OpenAI API compatible server
Expected behavior
Does it should output somemore info?
actual behavior
Nnothing but version
additional notes
Is that a problem with Qwen2.5-7b
I 'd appreciate if you guys could give me some help
The text was updated successfully, but these errors were encountered:
System Info
-CPU: Intel Xeon Platinum 8352V (144) @ 3.500GHz X86
-Memory: 1031689MiB
-GPU:RTX-4090*8
-Librarys
tensorrt 10.7.0
tensorrt_cu12 10.7.0
tensorrt-cu12-bindings 10.7.0
tensorrt-cu12-libs 10.7.0
tensorrt-llm 0.16.0
nvidia driver version
Driver Version: 550.135 CUDA Version: 12.4
OS Ubuntu 22.04.5 LTS x86_64
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I ran the trtllm-serve comman like:
trtllm-serve /home/lz/tensorrt/build/Qwen2.5-7B-Instructtrt_engines/weight_only/1-gpu
--tokenizer /home/lz/tensorrt/models/Qwen2.5-7B-Instruct
--max_batch_size 128 --max_num_tokens 4096 --max_seq_len 4096
--kv_cache_free_gpu_memory_fraction 0.95
But there is no output except the:
[TensorRT-LLM] TensorRT-LLM version: 0.16.0
No errors,no warnings no Port occupation
But it ran well with the test:
python3 /home/lz/TensorRT-LLM/examples/run.py --input_text "你好,请问你叫什么?"
--max_output_len=50
--tokenizer_dir /home/lz/tensorrt/models/Qwen2.5-7B-Instruct
--engine_dir=/home/lz/tensorrt/build/Qwen2.5-7B-Instructtrt_engines/weight_only/1-gpu
What Can I do to run an OpenAI API compatible server
Expected behavior
Does it should output somemore info?
actual behavior
Nnothing but version
additional notes
Is that a problem with Qwen2.5-7b
I 'd appreciate if you guys could give me some help
The text was updated successfully, but these errors were encountered: