Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] veRL Development Roadmap #22

Open
2 of 33 tasks
PeterSH6 opened this issue Nov 22, 2024 · 0 comments
Open
2 of 33 tasks

[Roadmap] veRL Development Roadmap #22

PeterSH6 opened this issue Nov 22, 2024 · 0 comments

Comments

@PeterSH6
Copy link
Collaborator

PeterSH6 commented Nov 22, 2024

Themes

We categorized our roadmap into 8 themes: Broad Model Support, Regular Update, More RL Algorithms support, Dataset Coverage, Plugin Support, Scaling Up RL, More LLM Infrastructure Support, Wide Hardware Coverage

Broad Model Support

To add a new model in veRL, the model should satisfy the following requirements:

  1. The models are supported in vLLM and huggingface transformers. Then you can directly use dummy_hf load format to run the new model
  2. [Optional for DTensor] For FSDP Backend, implement the dtensor_weight_loader for the model to transfer actor weights from FSDP checkpoint to vLLM model. See FSDP Document for more information
  3. For Megatron Backend, users need to implement the ParallelModel similar to modeling_llama_megatron.py , implement some corresponding checkpoint_utils to load checkpoints from the huggingface, and implement the megatron_weight_loader to transfer actor weights from ParallelModel directly to the vLLM model. See Megatron-LM Document for more information

Regular Update

More RL Algorithms Support

Make sure the algorithms can converge on some math datasets (e.g., GSM8k)

  • GRPO
  • Online DPO
  • Safe-RLHF (Multiple rm)
  • ReMax

Dataset Coverage

  • APPS (Code Generation)
  • codecontests (Code Generation)
  • TACO (Code Generation)
  • Math-Shepherd (Math)
  • competition_math (Math)

Plugin Support

  • Integrate SandBox and its corresponding datasets for Code Generation tasks

Scaling up RL

  • Integrate Ray Compiled Graphs (aDAGs) to speedup data transfer
  • Support FSDP HybridShard
  • Context Parallel
    • Ring Attention
    • Deepspeed Ulyssess
  • Aggressive offload techniques for all models
  • Support vLLM Rollout utilizes larger TP size than Actor model
  • Support Pipeline parallelism in rollout generation (in vllm or other LLM serving infra)

More LLM Infrastructure Support

LLM Training Infrastructure

  • Support TorchTitan for TP + PP parallelism
  • Support VeScale for Auto-Parallelism training

LLM Serving Infrastructure

At present, our project supports vLLM using the SPMD execution paradigm. This means we've eliminated the need for a standalone single-controller process (known as LLMEngine) by integrating its functionality directly into the multiple worker processes, making the system SPMD.

Wide Hardware Coverage

Supporting a new hardware type in our project involves the following requirements:

  1. Ray compatibility: The hardware type must be supported by the Ray framework, allowing it to be recognized and managed through the ray.utils.placement_group functionality.
  2. LLM infra and transformers support: To leverage the new hardware effectively, it is crucial that both LLM infra (e.g., vLLM, torch, Megatron-LM and others) and the transformers library provide native support for the hardware type.
  3. CUDA kernel replacement: We need to replace the CUDA kernels currently used in FSDP and Megatron-LM with the corresponding kernels specific to the new hardware.
@PeterSH6 PeterSH6 pinned this issue Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant