hkust-nlp / simpleRL-reason Public

Notifications You must be signed in to change notification settings
Fork 147
Star 1.9k

Code
Issues 18
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: hkust-nlp/simpleRL-reason

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

18 Open 7 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

worker process died

#25 opened Feb 3, 2025 by ypwang61

Why a critic model is needed?

#22 opened Feb 1, 2025 by xshadowxx

Have you tried to train the model by Lora? It requires less computation resources.

#21 opened Jan 31, 2025 by whpy

Why use PPO instead of GRPO?

#18 opened Jan 30, 2025 by Lineark

RuntimeError: Connection closed by peer when training on a single node

#17 opened Jan 29, 2025 by daxiongshu

Where is the long CoT data and preprocess script?

#16 opened Jan 28, 2025 by butterluo

Docker Container for Reproducing

#15 opened Jan 28, 2025 by Kartik14

what is used for critic model

#13 opened Jan 27, 2025 by Allenpai1

rStar-Math-7B row in the first table seems to be inaccurate?

#12 opened Jan 27, 2025 by rht

PoT style response?

#11 opened Jan 27, 2025 by SparkJiao

Could you point to me how do you preprocess the chat message?

#10 opened Jan 27, 2025 by mickelliu

Please give a repro on ray cluster setup?

#9 opened Jan 26, 2025 by kouroshHakha

Help reproducing training run

#8 opened Jan 26, 2025 by ctjlewis

Thanks for your fast movement and open source spirit

#5 opened Jan 26, 2025 by bendanzzc

Is there a response increase compared to initial length after RL?

#4 opened Jan 26, 2025 by Unakar

the minimum hardware resource configuration required for training?

#3 opened Jan 26, 2025 by tensorflowt

What is the reward?

#2 opened Jan 26, 2025 by yuanyaaa

A stupid question: Why researchers call such methods as "scaling test-time computation"?

#1 opened Jan 26, 2025 by huanranchen

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly