You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We categorized our roadmap into 8 themes: Broad Model Support, Regular Update, More RL Algorithms support, Dataset Coverage, Plugin Support, Scaling Up RL, More LLM Infrastructure Support, Wide Hardware Coverage
Broad Model Support
To add a new model in veRL, the model should satisfy the following requirements:
The models are supported in vLLM and huggingface transformers. Then you can directly use dummy_hf load format to run the new model
[Optional for DTensor] For FSDP Backend, implement the dtensor_weight_loader for the model to transfer actor weights from FSDP checkpoint to vLLM model. See FSDP Document for more information
For Megatron Backend, users need to implement the ParallelModel similar to modeling_llama_megatron.py , implement some corresponding checkpoint_utils to load checkpoints from the huggingface, and implement the megatron_weight_loader to transfer actor weights from ParallelModel directly to the vLLM model. See Megatron-LM Document for more information
Make sure the algorithms can converge on some math datasets (e.g., GSM8k)
GRPO
Online DPO
Safe-RLHF (Multiple rm)
ReMax
Dataset Coverage
APPS (Code Generation)
codecontests (Code Generation)
TACO (Code Generation)
Math-Shepherd (Math)
competition_math (Math)
Plugin Support
Integrate SandBox and its corresponding datasets for Code Generation tasks
Scaling up RL
Integrate Ray Compiled Graphs (aDAGs) to speedup data transfer
Support FSDP HybridShard
Context Parallel
Ring Attention
Deepspeed Ulyssess
Aggressive offload techniques for all models
Support vLLM Rollout utilizes larger TP size than Actor model
Support Pipeline parallelism in rollout generation (in vllm or other LLM serving infra)
More LLM Infrastructure Support
LLM Training Infrastructure
Support TorchTitan for TP + PP parallelism
Support VeScale for Auto-Parallelism training
LLM Serving Infrastructure
At present, our project supports vLLM using the SPMD execution paradigm. This means we've eliminated the need for a standalone single-controller process (known as LLMEngine) by integrating its functionality directly into the multiple worker processes, making the system SPMD.
Investigating how the one-controller process + SPMD architecture can be seamlessly integrated into veRL's existing WorkerGroup design.
Support TensorRT-LLM for rollout generation
Wide Hardware Coverage
Supporting a new hardware type in our project involves the following requirements:
Ray compatibility: The hardware type must be supported by the Ray framework, allowing it to be recognized and managed through the ray.utils.placement_group functionality.
LLM infra and transformers support: To leverage the new hardware effectively, it is crucial that both LLM infra (e.g., vLLM, torch, Megatron-LM and others) and the transformers library provide native support for the hardware type.
CUDA kernel replacement: We need to replace the CUDA kernels currently used in FSDP and Megatron-LM with the corresponding kernels specific to the new hardware.
Themes
We categorized our roadmap into 8 themes: Broad Model Support, Regular Update, More RL Algorithms support, Dataset Coverage, Plugin Support, Scaling Up RL, More LLM Infrastructure Support, Wide Hardware Coverage
Broad Model Support
To add a new model in veRL, the model should satisfy the following requirements:
dummy_hf
load format to run the new modeldtensor_weight_loader
for the model to transfer actor weights from FSDP checkpoint to vLLM model. See FSDP Document for more informationParallelModel
similar to modeling_llama_megatron.py , implement some corresponding checkpoint_utils to load checkpoints from the huggingface, and implement the megatron_weight_loader to transfer actor weights from ParallelModel directly to the vLLM model. See Megatron-LM Document for more informationRegular Update
postition_ids
to support remove padding in transformers models (transformers >= v4.45) [misc] feat: spport rmpad/data-packing in FSDP with transformers #91resource_pool
colocate) [misc] fix: weak reference of WorkerDict in RayTrainer #65resource_pool
.More RL Algorithms Support
Make sure the algorithms can converge on some math datasets (e.g., GSM8k)
Dataset Coverage
Plugin Support
Scaling up RL
More LLM Infrastructure Support
LLM Training Infrastructure
LLM Serving Infrastructure
At present, our project supports vLLM using the SPMD execution paradigm. This means we've eliminated the need for a standalone single-controller process (known as
LLMEngine
) by integrating its functionality directly into the multiple worker processes, making the system SPMD.WorkerGroup
design.Wide Hardware Coverage
Supporting a new hardware type in our project involves the following requirements:
ray.utils.placement_group
functionality.ParallelModel
Is non-RmPad version model and RmPad verison mdoel interchangeable? #20The text was updated successfully, but these errors were encountered: