Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trtllm-build llama3.1-8b failed #2688

Open
765500005 opened this issue Jan 14, 2025 · 5 comments
Open

trtllm-build llama3.1-8b failed #2688

765500005 opened this issue Jan 14, 2025 · 5 comments
Assignees
Labels
Investigating LLM API/Workflow triaged Issue has been triaged by maintainers

Comments

@765500005
Copy link

trtllm-build --checkpoint_dir ./tllm_checkpoint_2gpu_tp2
--output_dir ./tmp/llama/7B/trt_engines/fp16/2-gpu/
--context_fmha enable
--remove_input_padding enable
--gpus_per_node 8
--gemm_plugin auto

[TRT] [E] IBuilder::buildSerializedNetwork: Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/transformer/layers/0/attention/wrapper_L562/gpt_attention_L5483/PLUGIN_V2_GPTAttention_0 requires 210571452800 bytes of scratch space, but only 47697362944 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
)

I have 8 46GB GPUs, but this error occurs. Is this issue related to increasing the workspace size? How can I increase it?

@nv-guomingz
Copy link
Collaborator

Hi @765500005 may I know your cmd to generate ./tllm_checkpoint_2gpu_tp2 folder?

@github-actions github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Jan 14, 2025
@765500005
Copy link
Author

Hi @765500005 may I know your cmd to generate ./tllm_checkpoint_2gpu_tp2 folder?

Hi @nv-guomingz , this is my command :
python convert_checkpoint.py --model_dir /models/Meta-Llama-3.1-8B-Instruct
--output_dir ./tllm_checkpoint_2gpu_tp2
--dtype float16
--tp_size 2

@765500005
Copy link
Author

Hi @nv-guomingz . Excuse me, can you help me?

@nv-guomingz
Copy link
Collaborator

Hi @765500005 thanks for your patience.

I forgot to ask u provide your sw version.
However, I can build succesfully on my H100 node.

You may add the addtional parameters as below to reduce the memory requirement.
--max_batch_size 8-> 1

@nv-guomingz nv-guomingz self-assigned this Jan 21, 2025
@765500005
Copy link
Author

Name: tensorrt_llm
Version: 0.17.0.dev2024121700
Summary: TensorRT-LLM: A TensorRT Toolbox for Large Language Models
Home-page: https://github.com/NVIDIA/TensorRT-LLM
Author: NVIDIA Corporation
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.12/dist-packages

--------------------------------------------------------------------------------------------------------------------------------------------------and my consists of 8 Nvidia L20s

by set --max_batch_size 1
I have built it successfully, tks!
@nv-guomingz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigating LLM API/Workflow triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants