[misc] fix reward model issue with TokenClassification model and support running particular steps instead of epochs #99

PeterSH6 · 2025-01-12T14:54:52Z

This PR solves:

Add a CI with model-based reward to ensure it's always runnable
Support running for particular steps instead of epochs
Fix reward model chooses reward on EOS token after switching from AutoModelForSequenceClassification to AutoModelForTokenClassification

The flash_attention compatible issue will be fixed in the next PR.
I may need some time to check how to install different versions of flash_attn more efficiently in the CI machines.

verl/trainer/ppo/ray_trainer.py

PeterSH6 added 8 commits January 11, 2025 21:37

support user specify training steps

332e594

fix typo

fea2934

update ci

c8804f0

add ci

257c22c

fix reward model and write more ci script

f2f5332

update ci

a230d38

lint

0b6a130

align

3563f24

PeterSH6 requested a review from vermouth1992 January 12, 2025 14:54

PeterSH6 mentioned this pull request Jan 12, 2025

[misc] feat: support different flash_attn versions with variable num returns #100

Merged

vermouth1992 reviewed Jan 13, 2025

View reviewed changes

verl/trainer/ppo/ray_trainer.py Outdated Show resolved Hide resolved

PeterSH6 added 2 commits January 13, 2025 11:54

delete post training val

5677119

fix script

0725879

vermouth1992 approved these changes Jan 13, 2025

View reviewed changes

vermouth1992 merged commit a0e8ed2 into volcengine:main Jan 13, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[misc] fix reward model issue with TokenClassification model and support running particular steps instead of epochs #99

[misc] fix reward model issue with TokenClassification model and support running particular steps instead of epochs #99

PeterSH6 commented Jan 12, 2025

[misc] fix reward model issue with TokenClassification model and support running particular steps instead of epochs #99

[misc] fix reward model issue with TokenClassification model and support running particular steps instead of epochs #99

Conversation

PeterSH6 commented Jan 12, 2025