You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running docker using make => make -C docker release_run
[01/15/2025-01:15:18] [TRT-LLM] [I] Stopping response parsing.
[01/15/2025-01:15:18] [TRT-LLM] [I] Collecting last responses before shutdown.
[01/15/2025-01:15:18] [TRT-LLM] [I] Completed request parsing.
[01/15/2025-01:15:18] [TRT-LLM] [I] Parsing stopped.
[01/15/2025-01:15:18] [TRT-LLM] [I] Request generator successfully joined.
[01/15/2025-01:15:18] [TRT-LLM] [I] Statistics process successfully joined.
[01/15/2025-01:15:18] [TRT-LLM] [I]
===========================================================
= ENGINE DETAILS
===========================================================
Model: meta-llama/Llama-2-7b-hf
Engine Directory: /tmp/meta-llama/Llama-2-7b-hf/tp_1_pp_1
TensorRT-LLM Version: 0.16.0
Dtype: float16
KV Cache Dtype: FP8
Quantization: FP8
Max Sequence Length: 256
===========================================================
= WORLD + RUNTIME INFORMATION
===========================================================
TP Size: 1
PP Size: 1
Max Runtime Batch Size: 1280
Max Runtime Tokens: 2304
Scheduling Policy: Guaranteed No Evict
KV Memory Percentage: 90.00%
Issue Rate (req/sec): 2.8149E+13
===========================================================
= PERFORMANCE OVERVIEW
===========================================================
Number of requests: 3000
Average Input Length (tokens): 128.0000
Average Output Length (tokens): 128.0000
Token Throughput (tokens/sec): 12067.8672
Request Throughput (req/sec): 94.2802
Total Latency (ms): 31820.0387
===========================================================
actual behavior
Running docker using docker => docker run --rm --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=device=1 -it -p 8000:8000 -v <path-to-TensorRT-LLM/>:/app/ 89fg611dcfd
[TensorRT-LLM] TensorRT-LLM version: 0.16.0
[01/15/2025-10:23:14] [TRT-LLM] [I] Preparing to run throughput benchmark...
[01/15/2025-10:23:14] [TRT-LLM] [I] Setting up benchmarker and infrastructure.
[01/15/2025-10:23:14] [TRT-LLM] [I] Initializing Throughput Benchmark. [rate=-1 req/s]
[01/15/2025-10:23:14] [TRT-LLM] [I] Ready to start benchmark.
[01/15/2025-10:23:14] [TRT-LLM] [I] Initializing Executor.
[TensorRT-LLM][WARNING] Setting cudaGraphCacheSize to a value greater than 0 without enabling cudaGraphMode has no effect.
[TensorRT-LLM][INFO] Engine version 0.16.0 found in the config file, assuming engine(s) built by new builder API.
/usr/local/lib/python3.12/dist-packages/tensorrt_llm/bin/executorWorker: error while loading shared libraries: libnvinfer_plugin_tensorrt_llm.so: cannot open shared object file: No such file or directory
--------------------------------------------------------------------------
Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
Stays here for quire longer time, upon Ctrl+C
A request has timed out and will therefore fail:
Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345
Your job may terminate as a result of this problem. You may want to
adjust the MCA parameter pmix_server_max_wait and try again. If this
occurred during a connect/accept operation, you can adjust that time
using the pmix_base_exchange_timeout parameter.
--------------------------------------------------------------------------
Aborted!
--------------------------------------------------------------------------
(null) detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[40030,2],0]
Exit code: 127
--------------------------------------------------------------------------A request has timed out and will therefore fail:
Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345
Your job may terminate as a result of this problem. You may want to
adjust the MCA parameter pmix_server_max_wait and try again. If this
occurred during a connect/accept operation, you can adjust that time
using the pmix_base_exchange_timeout parameter.
--------------------------------------------------------------------------
Aborted!
--------------------------------------------------------------------------
(null) detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[40030,2],0]
Exit code: 127
--------------------------------------------------------------------------
additional notes
=> Need to map port and directory to save time and repeated HF model downloads
Thanks
The text was updated successfully, but these errors were encountered:
System Info
Who can help?
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Follow the steps on as per performance benchmarking link
Expected behavior
Running docker using
make
=>make -C docker release_run
actual behavior
Running docker using
docker
=>docker run --rm --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=device=1 -it -p 8000:8000 -v <path-to-TensorRT-LLM/>:/app/ 89fg611dcfd
Stays here for quire longer time, upon Ctrl+C
additional notes
=> Need to map port and directory to save time and repeated HF model downloads
Thanks
The text was updated successfully, but these errors were encountered: