You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work!
I'm curious about why researchers refer to such methods as "scaling test-time computation." From your blog, it seems there isn't an explicit scaling of test-time computation. It appears that models trained with simple PPO tend to generate longer answers.
The text was updated successfully, but these errors were encountered:
Because the models produce better results by generating a lot of „ thinking“ tokens compared to answering ad-hoc. Hence: more inference compute traded for better accuracy
Thanks for your great work!
I'm curious about why researchers refer to such methods as "scaling test-time computation." From your blog, it seems there isn't an explicit scaling of test-time computation. It appears that models trained with simple PPO tend to generate longer answers.
The text was updated successfully, but these errors were encountered: