trtllm-build llama3.1-8b failed #2688

765500005 · 2025-01-14T09:00:32Z

trtllm-build --checkpoint_dir ./tllm_checkpoint_2gpu_tp2
--output_dir ./tmp/llama/7B/trt_engines/fp16/2-gpu/
--context_fmha enable
--remove_input_padding enable
--gpus_per_node 8
--gemm_plugin auto

[TRT] [E] IBuilder::buildSerializedNetwork: Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/transformer/layers/0/attention/wrapper_L562/gpt_attention_L5483/PLUGIN_V2_GPTAttention_0 requires 210571452800 bytes of scratch space, but only 47697362944 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
)

I have 8 46GB GPUs, but this error occurs. Is this issue related to increasing the workspace size? How can I increase it?

nv-guomingz · 2025-01-14T14:29:07Z

Hi @765500005 may I know your cmd to generate ./tllm_checkpoint_2gpu_tp2 folder?

765500005 · 2025-01-15T01:08:30Z

Hi @765500005 may I know your cmd to generate ./tllm_checkpoint_2gpu_tp2 folder?

Hi @nv-guomingz , this is my command :
python convert_checkpoint.py --model_dir /models/Meta-Llama-3.1-8B-Instruct
--output_dir ./tllm_checkpoint_2gpu_tp2
--dtype float16
--tp_size 2

765500005 · 2025-01-20T08:00:20Z

Hi @nv-guomingz . Excuse me, can you help me?

nv-guomingz · 2025-01-21T02:43:21Z

Hi @765500005 thanks for your patience.

I forgot to ask u provide your sw version.
However, I can build succesfully on my H100 node.

You may add the addtional parameters as below to reduce the memory requirement.
--max_batch_size 8-> 1

765500005 · 2025-01-21T10:08:57Z

Name: tensorrt_llm
Version: 0.17.0.dev2024121700
Summary: TensorRT-LLM: A TensorRT Toolbox for Large Language Models
Home-page: https://github.com/NVIDIA/TensorRT-LLM
Author: NVIDIA Corporation
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.12/dist-packages

--------------------------------------------------------------------------------------------------------------------------------------------------and my consists of 8 Nvidia L20s

by set --max_batch_size 1
I have built it successfully, tks!
@nv-guomingz

nv-guomingz added the LLM API/Workflow label Jan 14, 2025

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Jan 14, 2025

nv-guomingz self-assigned this Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trtllm-build llama3.1-8b failed #2688

trtllm-build llama3.1-8b failed #2688

765500005 commented Jan 14, 2025

nv-guomingz commented Jan 14, 2025

765500005 commented Jan 15, 2025

765500005 commented Jan 20, 2025

nv-guomingz commented Jan 21, 2025

765500005 commented Jan 21, 2025

trtllm-build llama3.1-8b failed #2688

trtllm-build llama3.1-8b failed #2688

Comments

765500005 commented Jan 14, 2025

trtllm-build --checkpoint_dir ./tllm_checkpoint_2gpu_tp2 --output_dir ./tmp/llama/7B/trt_engines/fp16/2-gpu/ --context_fmha enable --remove_input_padding enable --gpus_per_node 8 --gemm_plugin auto

nv-guomingz commented Jan 14, 2025

765500005 commented Jan 15, 2025

765500005 commented Jan 20, 2025

nv-guomingz commented Jan 21, 2025

765500005 commented Jan 21, 2025

trtllm-build --checkpoint_dir ./tllm_checkpoint_2gpu_tp2
--output_dir ./tmp/llama/7B/trt_engines/fp16/2-gpu/
--context_fmha enable
--remove_input_padding enable
--gpus_per_node 8
--gemm_plugin auto