Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR testing SPMD vLLM is in its early stages and not ready for immediate merging.
I am here just to confirm that the main-branch vLLM now works successfully with verl's test case (run_fsdp_vllm.py) and demonstrates compatibility. Below are some baseline comparisons for weight sync duration.
Configuration: 8*L20 GPUs, TP=4
A. Across Process(broadcast/gloo): run with
python test_sync_weight_openrlhf.py
withvllm_sync_backend = "gloo"
, rank 7 broadcast weights to rank0-3 with gloo backend.B. Across Process(broadcast/nccl): run with
python test_sync_weight_openrlhf.py
withvllm_sync_backend = "nccl"
, rank 7 broadcast weights to rank0-3 with nccl backend.C. FSDP+vLLM: the original test case, run 4 workers with
torchrun --nproc-per-node=4 run_fsdp_vllm.py
D. FSDP+vLLM(spmd): using vllm='0.6.6.post2.dev252+g8027a724', run 4 workers with
torchrun --nproc-per-node=4 run_fsdp_vllm_spmd.py
And the weight sync time (unit:second) is recorded as:
Note that the across-process weight sync only includes the broadcast component (the complete weight sync should also include weight gathering). FSDP+vLLM and FSDP+vLLM(spmd) should perform identically since the sync weight logic remains unchanged. Based on these results, I have two conclusions: