[WIP] Integrating vllm>=0.7.0 #209

ZSL98 · 2025-02-05T17:09:56Z

This PR aims to integrate vllm>=0.7.0 and preserve:
Backward compatibility: 0.3.1, 0.4.2, 0.5.4, 0.6.3 are still supported
Forward compatibility: Future versions of vllm (>= 0.7.0) will be supported without requiring manual maintenance for each new release.

The integration up to now only supports the fsdp backend. We have some rough numbers (vllm generation time in seconds in an iteration) for comparison. We are happy to see that with eager mode off, the generation time is shorten with the help of cuda graph. The sleep mode of vllm helps to reduce GPU memory pressure.

	vllm 0.6.3	vllm 0.7.0 (eager=true)	vllm 0.7.0 (eager=false)
Qwen2-7b	16	13.5	12
Qwen2-7b_rm	14.5	11.5	10
Qwen2-7b_seq_balance	99	97	72

To reproduce

Install vllm with pip3 install vllm==0.7.0 or pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly.
Apply the following minor modifications (I have posted as an issue [Bug]: Some issues in integrating vllm with verl vllm-project/vllm#12782):

vllm/distributed/parallel_state.py: Comment out the assertion below:

    if (world_size
            != tensor_model_parallel_size * pipeline_model_parallel_size):
        raise RuntimeError(
            f"world_size ({world_size}) is not equal to "
            f"tensor_model_parallel_size ({tensor_model_parallel_size}) x "
            f"pipeline_model_parallel_size ({pipeline_model_parallel_size})")

vllm/executor/uniproc_executor.py: change local_rank = rank to local_rank = int(os.environ["LOCAL_RANK"])

Try a bash run_qwen2-7b.sh under examples/ppo_trainer

Note

The test cases are currently limited, and bug-free operation is not guaranteed at this stage.

PeterSH6 · 2025-02-06T02:45:47Z

verl/third_party/vllm/vllm_spmd/verl_executor.py

+        self.collective_rpc("verl_init_device")
+        self.collective_rpc("load_model")
+
+    def determine_num_available_blocks(self) -> Tuple[int, int]:


I wonder why we need this function? It seems that vLLM UniProcExecutor already perform allreduce MIN.

I have removed this file.

zhangshulai and others added 18 commits January 17, 2025 16:24

[test] test for vllm-spmd

be4cd50

[test] test for sync weight in OpenRLHF style

d76c04d

[chore] Remove dependencies on vllm<=0.6.3

e20ba1b

[test] Add time profiling on vllm sync weight

ac4c91d

[test] Some formatting changes

6fb4999

Merge branch 'volcengine:main' into zsl/vllm-spmd

b64a473

Merge branch 'volcengine:main' into zsl/vllm-spmd

4fe511a

Merge branch 'volcengine:main' into zsl/vllm-spmd

c77bbec

Add a tiny version of run_qwen2-7b_seq_balance.sh

234a52d

init some files

bc689b6

Merge remote-tracking branch 'upstream/main' into zsl/vllm-spmd

6f55342

update

5a2d526

update

c0a5099

update

4a6d686

support fsdp

6c78554

support vllm>=0.7.0 and fsdp

ef47177

Merge remote-tracking branch 'origin/main' into latest

e8a7487

Merge branch 'volcengine:main' into latest

d71b5a2

ZSL98 mentioned this pull request Feb 5, 2025

[test] Add tests for SPMD vLLM #116

Closed

Merge branch 'volcengine:main' into latest

18ed87f

PeterSH6 reviewed Feb 6, 2025

View reviewed changes

zhangshulai added 2 commits February 6, 2025 15:53

remove redundant files

a27ee29

update

e936114

ZSL98 marked this pull request as ready for review February 6, 2025 08:54

[test] update run_fsdp_vllm_spmd.py

ffa88ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Integrating vllm>=0.7.0 #209

[WIP] Integrating vllm>=0.7.0 #209

ZSL98 commented Feb 5, 2025 •

edited

Loading

PeterSH6 Feb 6, 2025

ZSL98 Feb 6, 2025

[WIP] Integrating vllm>=0.7.0 #209

Are you sure you want to change the base?

[WIP] Integrating vllm>=0.7.0 #209

Conversation

ZSL98 commented Feb 5, 2025 • edited Loading

To reproduce

Note

PeterSH6 Feb 6, 2025

Choose a reason for hiding this comment

ZSL98 Feb 6, 2025

Choose a reason for hiding this comment

ZSL98 commented Feb 5, 2025 •

edited

Loading