You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, it does meet our expectations, and we observe a similar score in Wombat-7B-gpt4 vs ChatGPT.
The reason is Wombat-7B uses 5 responses for one query to train RRHF.
Although Wombat-7B-gpt4 uses better responses, but it only contain 2 responses for one query.
We think more diverse responses are the most important point of training RRHF.
Another possible thing is Wombat-7B use responses from its initial checkpoint, while Wombat-7B-gpt4 does not use the response from its initial checkpoint.
If RRHF is trying to improve based on itself, not using responses from its initial checkpoint worse RRHF's performance.
Wombat-7B and Wombat-7B-gpt4: use the script recover_wombat_7b.sh
According to the above results, Wombat-7B has better results than Wombat-7B-gpt4, does the result meet expectations?
The text was updated successfully, but these errors were encountered: