-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to reproduce Table 7 from the paper #176
Comments
Hello, Sorry for the delayed answer. Also, we use 8 GPUs, so I think you should use multiply gradient accumulation by 4 (so 32 for base) if you are using 2 GPUs (which should be pretty equivalent, as I believe ST do not use samples from the other GPUs with gathering). Finally, please also note that the script released is leveraging As for the evaluation, we did not use the MTEB library directly but a custom script attached below.
This should work for every dataset except cpadup, which we loaded with |
Hi, I have been trying to replicate the BEIR scores in table 7 from the paper. I used the train_st.py script as is, and trained on 2 GPUs for 1 epoch like this
accelerate launch --num_processes num_gpu train_st.py
, and then evaluated it on BEIR using MTEB library.I use the hyper-parameters suggested in Table 9 (lr 8e-5 for ModernBert, 5e-5 for bert-base), rest default values from the script. I am not able to replicate the numbers, any idea what could be the difference? Could you please list the hyperparams you use? How many GPUs, batch size, or any special arguments to pass to MTEB?
Thanks!
The text was updated successfully, but these errors were encountered: