Skip to content

Commit

Permalink
fix rm and disable val in some ci
Browse files Browse the repository at this point in the history
  • Loading branch information
PeterSH6 committed Jan 17, 2025
1 parent c851bc2 commit 6581369
Show file tree
Hide file tree
Showing 5 changed files with 5 additions and 1 deletion.
1 change: 1 addition & 0 deletions tests/e2e/run_qwen_gsm8k_function_rm_no_rmpad.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ python3 -m verl.trainer.main_ppo \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
trainer.logger=['console'] \
+trainer.val_before_train=False \
trainer.project_name='verl_example_gsm8k' \
trainer.experiment_name='qwen_e2e_ci_function_rm' \
trainer.n_gpus_per_node=8 \
Expand Down
1 change: 1 addition & 0 deletions tests/e2e/run_qwen_gsm8k_model_rm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ python3 -m verl.trainer.main_ppo \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
trainer.logger=['console'] \
+trainer.val_before_train=False \
trainer.project_name='verl_example' \
trainer.experiment_name='Qwen2.5-0.5B-ci_hybrid_rm' \
trainer.n_gpus_per_node=8 \
Expand Down
1 change: 1 addition & 0 deletions tests/e2e/run_qwen_gsm8k_model_rm_no_rmpad.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ python3 -m verl.trainer.main_ppo \
reward_model.micro_batch_size=16 \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
+trainer.val_before_train=False \
trainer.logger=['console'] \
trainer.project_name='verl_example' \
trainer.experiment_name='Qwen2.5-0.5B-ci_hybrid_rm' \
Expand Down
1 change: 1 addition & 0 deletions tests/e2e/run_qwen_gsm8k_model_rm_ulysses.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ python3 -m verl.trainer.main_ppo \
reward_model.micro_batch_size=16 \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
+trainer.val_before_train=False \
trainer.logger=['console'] \
trainer.project_name='verl_example' \
trainer.experiment_name='Qwen2.5-0.5B-ci_hybrid_rm_sp2' \
Expand Down
2 changes: 1 addition & 1 deletion verl/workers/fsdp_workers.py
Original file line number Diff line number Diff line change
Expand Up @@ -954,7 +954,7 @@ def compute_rm_score(self, data: DataProto):
# perform forward computation
with self.ulysses_sharding_manager:
rm_data = self.ulysses_sharding_manager.preprocess_data(data=rm_data)

data = self.ulysses_sharding_manager.preprocess_data(data=data)
micro_batches = rm_data.batch.split(self.config.micro_batch_size)
output = []
for micro_batch in micro_batches:
Expand Down

0 comments on commit 6581369

Please sign in to comment.