-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of SFT #1
Comments
Hi, Hanqer! Thank you for your kind words about our work! During our experimental process, we found that the parameter scale of the base model is quite important.
We hope this answers your question. |
@EliverQ Thanks for your reply! But I also have some concerns that deepseek-r1 and o1-mini are both small size model (smaller than 34B) which have strong reasoning and chain of thought ability. Why imitate learning is not effective for small size models? |
Thank you for your question!
We are currently exploring ways to activate slow reasoning capabilities through imitation learning and then utilize reinforcement learning for effective scaling during training. If you're interested, welcome to stay tuned for our upcoming work! |
我在qwen2.5-7b-instruct上sft您开源的数据之后,模型在推理时出现了重复生成的情况,其在GPQA上的表现不佳,不如原始的qwen2.5-7b-instruct,但是在MATH数据集上基本持平,您认为这种现象是和模型的尺寸有关吗?您是否有做过小尺寸(<32B)模型的对比实验,是否能够分享一下,期待您的回复! @EliverQ |
谢谢你的关注!我们确实有做过不同大小/系列模型的实验,可以参考一下这里~ #1 (comment) |
The performance reported in the paper showing promising results, especially the SFT term.
My question is: Is this imitate learning process can be generalized to other models, such as Qwen2.5-7B, LLaMa3.1-8B, etc. Because the imitate data are generated by QwQ-preview, which is based on Qwen2.5-32B, is it naturally benefit only for Qwen2.5-32B, or can be generalized to any other models?
Thanks.
The text was updated successfully, but these errors were encountered: