-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Citing this resource #414
Comments
Hi Yuval, |
BTW we are working on "tiding-up" the tools provided in the projact and would be interested to know more about how you used these. Any chance to geet extra details on your project? |
Sure, we are pretraining transformer encoder-decoder models using large corpora (the Pile, Wikipedia, and RealNews), and used the modification and filtering tools to clean up the data (English only, not multi-lingual). |
@yuvalkirstain, we are planning to get a Zenodo DOI for this GitHub repository. |
Hello,
We are using this resource to filter pretraining data for our current project, and we would love to know if and how it should be cited.
Thanks :)
The text was updated successfully, but these errors were encountered: