Is there a limit on the largest PPO model size and max context length the package can support to scale out of box? Is it scalable when we add more GPU nodes to the training job? Thanks! Sent from my iPhoneOn Jan 17, 2024, at 11:12 AM, HeyyyyyyG ***@***.***> wrote: #88
Replies: 1 comment
-
Pasting response from @odelalleau below: NeMo-Aligner relies on Megatron-LM to support various parallelism schemes that allow scaling to very large models. We have yet to push it to its limits so it'd be hard to provide a reliable answer in terms of max size and max context length as of today. It is indeed scalable as we add more GPU nodes to the training job, but we have several optimizations in the works that we want to integrate before we do more extensive performance benchmarking. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
is there a limit on the largest PPO model size and max context length the package can support to scale out of box? Is it scalable when we add more GPU nodes to the training job? Thanks! Sent from my iPhoneOn Jan 17, 2024, at 11:12 AM, HeyyyyyyG @.***> wrote:
Hi, we did PPO on Llama-70B model with 4k context length. In terms of GPU count, we used 32x8 GPUs for the actor and 8x8 GPUs for the critic. Try our NV-Llama2-70B-RLHF model on NVIDIA AI Foundation for free.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Originally posted by @panyi121 in #70 (comment)
Beta Was this translation helpful? Give feedback.
All reactions